|
PREPARING
DATA FOR STATISTICAL ANALYSIS
Indiana University Department of Medicine
Division of Biostatistics
The time and effort involved in performing statistical
analyses can be greatly reduced if the data is entered in the proper format. It is best to consult with a Data Manager
from the Division of Biostatistics before you begin to collect the data if possible
to ensure that the format is compatible with the statistical analysis software that
will be used. Following are some guidelines
that will help us process your data more efficiently.
Regardless of the format selected, subject identifiers such as name and social security
number
must be removed from the dataset before it is transferred to Biostatistics.
It is helpful to provide a copy of the data collection forms identifying each variable,
a list of any coding conventions
used, and at least one copy of an actual data record.
The data may be given on either a 3.5" diskette, zip disk or transferred electronically.
SPREADSHEETS (Lotus, Excel, Quatro Pro, etc.)
The first line of the spreadsheet should contain the
variable names which must be no more than 8 alphanumeric (0-9, A-Z, a-z) characters
in length and must begin with an alphabetic character either upper or lower case. Each variable name must be unique.
The actual data records should begin on the second line of the spreadsheet. The data for one subject should
not exceed one line. If more columns
are needed, additional sheets may be used. Do not include blank lines.
Do not include summary scores or totals that can be calculated from the existing
data. Enter dates in the same format
throughout the spreadsheet. If the data values will be numeric (drug dosage, for
example), format the column as numeric and do not enter any character strings. An additional column can be added to
record character strings such as the unit of measure
("mg")
RELATIONAL DATABASE MANAGEMENT SYSTEMS
(ACCESS, FoxPro, Dbase, Paradox, etc.)
Variable names should not exceed 8 alphanumeric characters
in length and must begin with an alphabetic character.
Choose numeric, character, or date data types as dictated by the data to
be entered. Avoid using logical data
types and memo fields whenever possible.
TEXT FILES (ASCII, DOS, etc.)
Data may be entered using a word processor (i.e. WordPerfect,
Microsoft Word, etc.). Avoid the use of control characters, key strokes such as
tabs, and table structures. The file
should be saved as text, ASCII format.
The data values should be aligned in columns with each row comprising one data record. For missing numeric or date values a
'.' should be entered in the appropriate column.
For missing character values the appropriate column should be left blank.
For further guidelines, please contact Beverly Musick at (317) 274-2693 or bsmusick@iupui.edu.
|