|
|
|
PREPARING
DATA FOR STATISTICAL ANALYSIS Indiana
University Department of Medicine Division
of Biostatistics
The
time and effort involved in performing statistical analyses can be greatly
reduced if the data is entered in the proper format. It is best to consult with a Data Manager from the Division of
Biostatistics before you begin to collect the data if possible to ensure that the format is compatible with the
statistical analysis software that will be used. Following are some guidelines that will help us process your
data more efficiently. Regardless of the format selected, subject identifiers such as name and social security number must be removed from the dataset before it is transferred to Biostatistics. It is helpful to provide a copy of the data collection forms identifying each variable, a list of any coding conventions used, and at least one copy of an actual data record. The data may be given on either a 3.5" diskette, zip disk or transferred electronically. SPREADSHEETS
(Lotus, Excel, Quatro Pro, etc.)
The first line of the spreadsheet should contain the variable names which
must be no more than 8 alphanumeric (0-9, A-Z, a-z) characters in length and
must begin with an alphabetic character either upper or lower case.
Each variable name must be unique. The
actual data records should begin on the second line of the spreadsheet.
The data for one subject should not exceed one line.
If more columns are needed, additional sheets may be used. Do not include blank lines. Do
not include summary scores or totals that can be calculated from the existing
data. Enter dates in the same
format throughout the spreadsheet. If the data values will be numeric (drug dosage, for example), format the column as numeric and do not enter any character strings. An additional column can be added to record character strings such as the unit of measure ("mg") RELATIONAL
DATABASE MANAGEMENT SYSTEMS (ACCESS, FoxPro, Dbase, Paradox, etc.)
Variable names should not exceed 8 alphanumeric characters in length and
must begin with an alphabetic character. Choose
numeric, character, or date data types as dictated by the data to be entered.
Avoid using logical data types and memo fields whenever possible.
TEXT
FILES (ASCII, DOS, etc.)
Data may be entered using a word processor (i.e. WordPerfect, Microsoft
Word, etc.). Avoid the use of control characters, key strokes such as tabs, and
table structures. The file should
be saved as text, ASCII format. The
data values should be aligned in columns with each row comprising one data
record. For missing numeric or date
values a '.' should be entered in the appropriate column.
For missing character values the appropriate column should be left blank. For further guidelines, please contact Beverly Musick at (317) 274-2693 or bsmusick@iupui.edu. |