Indiana University
Maps & Directions  |  Make a Gift  |  Login
Preparing Data for Statistical Analysis

 

PREPARING DATA FOR STATISTICAL ANALYSIS

Indiana University Department of Medicine

Division of Biostatistics

 

The time and effort involved in performing statistical analyses can be greatly reduced if the data is entered in the proper format.  It is best to consult with a Data Manager from the Division of Biostatistics before you begin to collect the data if possible to ensure that the format is compatible with the statistical analysis software that will be used.  Following are some guidelines that will help us process your data more efficiently.

Regardless of the format selected, subject identifiers such as name and social security number must be removed from the dataset before it is transferred to Biostatistics. It is helpful to provide a copy of the data collection forms identifying each variable, a list of any coding conventions used, and at least one copy of an actual data record. The data may be given on either a 3.5" diskette, zip disk or transferred electronically.

 

SPREADSHEETS (Lotus, Excel, Quatro Pro, etc.)

            The first line of the spreadsheet should contain the variable names which must be no more than 8 alphanumeric (0-9, A-Z, a-z) characters in length and must begin with an alphabetic character either upper or lower case.  Each variable name must be unique.  The actual data records should begin on the second line of the spreadsheet.  The data for one subject should not exceed one line.  If more columns are needed, additional sheets may be used. Do not include blank lines.  Do not include summary scores or totals that can be calculated from the existing data.  Enter dates in the same format throughout the spreadsheet. If the data values will be numeric (drug dosage, for example), format the column as numeric and do not enter any character strings. An additional column can be added to record character strings such as the unit of measure ("mg")

 RELATIONAL DATABASE MANAGEMENT SYSTEMS (ACCESS, FoxPro, Dbase, Paradox, etc.)

            Variable names should not exceed 8 alphanumeric characters in length and must begin with an alphabetic character.  Choose numeric, character, or date data types as dictated by the data to be entered.  Avoid using logical data types and memo fields whenever possible. 

 TEXT FILES (ASCII, DOS, etc.)

            Data may be entered using a word processor (i.e. WordPerfect, Microsoft Word, etc.). Avoid the use of control characters, key strokes such as tabs, and table structures.  The file should be saved as text, ASCII format.  The data values should be aligned in columns with each row comprising one data record.  For missing numeric or date values a '.' should be entered in the appropriate column.  For missing character values the appropriate column should be left blank.

For further guidelines, please contact Beverly Musick at (317) 274-2693 or bsmusick@iupui.edu.