It is important to contact us as early as possible when you are planning a study (see Grant/Protocol Development). Once the study protocol is approved by the IRB (and SRC and GCRC as necessary), we can begin to help you implement your study plan.
Data collection plans and forms
Financial Arrangements for Biostatistical Support
The Division of Biostatistics operates as a recharge center, an entity whose primary goal is to provide a service for the convenience of Indiana University. Due to this classification the division receives a limited amount of financial support from the School of Medicine. Therefore, it is necessary that investigators (or their departments/units) provide funding for all services and support costs, including data management, data entry, statistical analysis, and manuscript preparation. Funding for collaborative research projects is received by Biostatistics primarily through three mechanisms:direct support for named individuals on grants with NIH and other agencies, Biostatistics Services Support for smaller amounts of effort budgeted on grants and contracts, and fee-for-service billing for unbudgeted internal projects and outside agreements that are billed hourly.
The limited amount of funding provided by the School of Medicine is used for grant development and study design. The Division is happy to accommodate investigators, within the School of Medicine, requiring help in study design, sample size estimation, and proposal writing free of charge. Researchers from other schools should discuss financial arrangement for grant development during their initial meeting since those efforts may be supported through other agreements (e.g. IU Cancer Center).
Investigators are encouraged to work with biostatisticians during the preparation of their proposals and to include adequate biostatistical support in their budgets. Discussions with the primary biostatistician regarding budget items and the proper method of inclusion of these costs on the project should be made prior to grant submission.
Projects that contain substantial biostatistical support are submitted for outside funding with Division faculty and staff specifically listed in the personnel section of the grant. Substantial support is defined as at least 10% of a Division faculty or 20% of a staff member in any of the funded years. Also included in the other expenses section of the grant are funds for Biostatistics Computing Services that are used to support basic computer and other infrastructure needs of the Division. The amount included for Biostatistics Computing Services is based on the faculty and staff FTE supported by the budget. The Biostatistics portion of all budgets is prepared by Shari Stanbery, the Division Business Manager, and must be approved by Michelle Meeks in the Department of Medicine Research Administration Office before they are submitted for outside funding. New rates have been calculated as of July 1, 2004. For details see Biostatistics Computing Services
The budgets for all projects represent our best estimates for the proposed research, based on past experience with similar projects. We recognize that research projects do not always go exactly as planned. There are frequently changes in the timeline and even occasional changes in the scope of the work. Therefore, we will monitor the effort of Biostatistics personnel biannualy to assure that it matches support received, within the variance allowed by the NIH or awarding organization. No funding changes will be made without first consulting the Principal Investigator of the project. Although the level of support may vary from the original budget in a particular time period to match effort, the total budget for biostatistical support will not exceed the budgeted amount over the entire granting period.
For projects that include direct support for named individuals on grants , Biostatistics Computing Services includes all expenses related to the project's use of the Division's infrastructure, including computing facilities. These include a network of PCs and Sun Microsystems servers and workstations and a wide variety of statistical and scientific software packages. These are prorated at the rates of $10,068, $6,516, $5,027, and $3,076 per FTE of Ph.D. Biostatistician, M.S. Biostatistician, Data Manager, and Data Technician/IT Support, respectively. This charge, based on actual costs in the Division of Biostatistics, is to enable the personnel to perform their duties for a project, but is separate from any specific equipment requested for the exclusive use of the proposed project. It includes hardware, software, supplies, systems support, connect charges, travel, and other infrastructure costs. This budget item is analogous to charges by University Computer Services for use of mainframe computer time.
These rates were recalculated as of July 1, 2006 to better reflect the actual costs of the Division. All proposals submitted after that time should reflect these new rates. Projects already funded and those submitted before July 1, 2006 were budgeted using estimated rates. Since the new rates represent the actual costs of the Division, the quarterly statements associated with all projects will reflect the new Biostatistics Computing Services rates. If there is a problem in meeting these rates within the project budget, please contact Shari Stansbery (sjdraper@iupui.edu ,274-2647)
*Departments and Divisions that support Division of Biostatistics using funding mechanisms not associated with a particular project or grants (e.g. general fund accounts) are not responsible for these costs.
Biostatistics faculty salaries are individually specified on a grant budget only when their percentages of effort are 10% or more for at least one year. Staff salaries are individually specified on a grant budget only when their percentages of effort for at least one year of the grant period are 20% or greater. Biostatistics Services for projects that fall below both of these criteria represent hourly biostatistical support as required by the proposed scope of the work. This may entail any or all of the following: study design, data entry, data management, statistical analysis, interpretation of results, and manuscript preparation. The number of hours for each task is estimated and budgeted based on the Division of Biostatistics fee schedule . Shari Stansbery, the Division Business Manager, prepares the Biostatistics portion on all budgets.
The budgeted amount represents our best estimate for the proposed research, based on past experience with similar projects. If the project is funded, the principal investigator needs to fill out a collaborative agreement and return the agreement to the Biostatistics collaborator on the project. All effort associated with biostatistical activities is then consolidated and billed by the Division of Biostatistics on a quarterly basis, to the account provided by the investigator. This represents actual effort in the previous three months and may be larger or smaller than the budgeted amount. Thus, it is important that the required work remain within the original scope in order to minimize variation from the estimated budget.
Research projects that do not have Biostatistics funding in their budget and unfunded projects, will be supported by the Division on a fee-for-service basis. After the initial meeting with the researcher, the primary biostatistician on the project will prepare a detailed estimate of the funding needed and the proposed time-line to complete the project based on the Division of Biostatistics fee schedule . Once the project terms are agreed upon, a collaborative agreement should be filled out and returned to the primary biostatistician. During the course of the project, all effort associated with biostatistical activities is consolidated and billed by the Division of Biostatistics on a quarterly basis to the account* provided on the collaborative agreement. The Division is responsible for monitoring this effort and informing the investigator if the original estimate appears to be incorrect.
*Account number and title required for internal (IU account) billing only
Fees are charged to support personnel and cover other expenses needed to maintain the Division at a level required to carry out its mission. The fee schedule includes all expenses related to Division personnel and enables them to perform their duties related to the research project. These include use of the Division's computing facilities, systems support, software, administrative support, professional travel, supplies, and telephone. These rates are based on actual costs to the Division of Biostatistics. Rates for research projects outside of Indiana University contain an additional component to help support the educational and service missions of the Division.
| | IU Rate/hour | Outside Rate/hour* | Cancer Center Member Rate/hour |
| Ph.D. Faculty |
$109 |
$135 |
$87 |
| M.S. Statistician |
$65 |
$80 |
$52 |
| Data Manager |
$47 |
$59 | $38 |
| Database Technician |
$27 |
$33 |
$22 |
| Data Manager Assistant | $18 | $21 | $18 |
| Data Entry | $16 | $19 | $16 |
*Emergency requests for services without prior scheduling by Biostatistics personnel will be accomodated only if time permits and services will be charged at double the above rates.
How to obtain an estimate
To obtain an estimate for any kind of biostatistician support, whether it is for direct support for named individuals on a grant or fee-for-service support, first contact the lead statistician in your project area. See the Contact Information section for more information regarding whom to contact if needed.
The mechanism used to provide support does not alter the collaborative atmosphere that the Division has attempted to foster over the years within the IU health sciences community. The contribution of each person needs to be evaluated as a manusript is prepared. Consideration for authorship should be based on the accepted criteria for most medical journals. These criteria generally cite both study design and statistical analysis as intellectual input sufficient for authorship. It is impossible to define every situation in advance; however, it should be clear that reimbursement for time does not preclude or replace authorship.
Select data management team members work with researchers during the project development phase to design clear, correct, and complete data collection forms as well as determine appropriate data collection procedures. The data management team will design and implement a computerized data entry system backed by a relational database system to store all data. To ensure quality, all data entry systems have range and value restrictions for all variables and internal consistency checks where appropriate. In addition, all data entry clerks and research assistants are fully trained by the data manager to properly enter the data, and all entered data is verified using an appropriate verification process. As data collection progresses, data management personnel generate reports detailing any and all aspects of the project to date including recruitment, screening, retention, compliance with study protocols, and demographics. All decisions regarding data management are made in collaboration with the investigators and researchers. Finally, the data management team is responsible for preparing the datasets for analysis in conjunction with the statisticians.
Before any data is collected, the data manager works with the investigators and other project personnel to design or revise the data collection forms. As the details of a study begin to take shape, data collection forms are analyzed to ensure that all questions are clear, all variables are properly coded, the design is logical and easy to follow, and the methods and timing of data collection are clear and viable. The analysis plan is reviewed to ensure that all information needed for analysis is present and appropriately recorded.
Once the data collection forms are established, a relational database and entry system are created to store the data. Database systems can be created to capture data that is initially collected on paper as well as data that is collected and entered directly such as in a computer-aided interview. Entry screens are created to look as much like the data collection forms as possible. Example: Biostatistics' Web Demo Site. All computerized data entry systems can either be designed with access limited to one specific computer or network or with access through the World Wide Web. All data is stored on a secured server accessible only to those researchers involved directly with the project.
One option for data collection is the TELEForm system , which allows data managers to customize data collection forms that can be read and interpreted by a scanner. Researchers record data onto paper forms which are then scanned into the computer interpreted once by the TELEForm software, and then verified by a data entry clerk. Images of forms can be stored electronically, providing an additional back-up of the data. Example: TELEform data collection form.
The choice of the data entry system will determine the methods of verification to be used to ensure data accuracy. All data entry systems can be designed with range and value restrictions for all variables, and all data will be reviewed and processed through several verification programs. Greater accuracy can be obtained for data collected on paper through use of a double entry system. Each form is entered twice by separate data entry clerks and then compared to determine discrepancies. All differences between entries are compared with the original forms and the necessary corrections are made. Specific data questions are sent to the appropriate site personnel who will then investigate and resolve any questions. The TELEForm system automatically employs this double entry system with the computer's interpretation of the scanned data acting as the first entry and the data entry clerk verifying those interpretations acting as the second entry.
The data management team members fully train all associated researchers and data entry clerks in data collection and data entry procedures. As the project advances, the data management team monitors the data collection and entry for unexpected trends or differences among the data collectors. If necessary, researchers or data entry clerks are retrained.
As a study progresses, issues arise that may require adjustments to the original protocol. In order to maximize accuracy and completeness of data, the data management team participates in the decisions about how to implement these adjustments. Changes may require revisions to the data collection procedures, forms and entry system. Data managers meet regularly with researchers to discuss and resolve data collection problems as they arise.
Reports detailing the screening, recruitment, and retention of study participants are generated periodically and distributed to the research team via e-mail, website, or in person. Preliminary statistical reports depicting sample frequencies and means are also generated by the data management staff. Queries to resolve data questions or highlight missing information are generated throughout the study and distributed to the appropriate personnel for correction and/or clarification.
To prepare the data for analysis, the data manager combines all data from the various instruments and sites, if applicable, and converts the data to a format (SAS) suitable for statistical analysis. Rudimentary analysis is performed to ensure that the data is properly merged and to identify any outliers. SAS programs are used to create any additional variables required for analysis, including duration variables, categorical variables, and summary scores. The data management personnel continue to work with biostatistician team members during the analysis process to answer questions and provide additional information about the structure of the data set, the data collection process, or specific data records as needed.
A single large research project will usually involve a statistical team made up of a Biostatistics faculty member and a masters-level biostatistician. In addition, larger teams are often associated with particular collaborative research areas with responsibilities then assigned to specific studies. These focused areas are sometimes built around a center or program project grant (e.g. cancer or sexually transmitted diseases) but may also involve a collection of single project grants in a related area (e.g. clinical pharmacology). It is not unusual for these larger analysis teams to include a second faculty member and/or additional staff biostatisticians. Also, the team may include a faculty member with special expertise in a particular statistical area that is required for the project (e.g. analysis of microarray data or evaluation of diagnostic tests). Individual smaller projects might be handled by one of our experienced staff biostatisticians without involvement of a Biostatistics faculty member.
All investigators are encouraged to include data management by the Division of Biostatistics in their plans and budget. This will ensure the use of state of the art techniques related to the database management system and quality control. It will also ensure that the data are transferred to the statistical analysis team in the proper format for the proposed analyses. We realize that this may not always be possible so it is important to realize that the time and effort involved in performing statistical analyses can be greatly reduced if the data are entered in the proper format. It is best to talk with someone from the Division of Biostatistics before you begin to enter the data and even before you begin to collect the data, if possible, to ensure that the format is compatible with the statistical analysis software and techniques that will be used. Regardless of the format selected, it is helpful to provide a copy of the data collection form, a list of any coding conventions used, and at least one copy of an actual data record explaining each variable and codes. The data may be given on diskette or CD formatted for an IBM-compatible PC. See Preparation of Data for Statistical Analysis for guidelines for use of spreadsheets, relational database management systems (RDBMS) and text files that will help us process your data more efficiently. Data transfer to Biostatistics must follow HIPAA requirements with respect to safeguarding the security of PHI. Please refer to http://www.iupui.edu/~resgrad/hipaa/hipaa_menu.htm.
In a well planned study the choice of statistical techniques was initially made during the design phase. These statistical methods are chosen based on the specific aims and the key hypotheses to be tested. However, a final choice cannot be made until the assumptions necessary for the particular analyses are verified. Thus, it is usually important to begin the analysis process using exploratory data analysis methods. These methods usually involve graphs and descriptive statistics for assessing distributions, sources of variation, associations among variables and independence of observations. Like pictures, properly designed graphs can easily be worth a thousand words (or numbers). The initial consideration is usually the distribution of the outcome measure, including examination of possible outliers. Data that are dichotomous or categorical should be analyzed with methods designed for categorical data. Similarly, "classical" methods are generally appropriate for data that are approximately normal or for moderate to large sample sizes where the means are approximately normal. In some cases, like counts or survival data with censored observations, specialized modeling techniques are required. Each statistical model or method has its own particular set of underlying assumptions. These, along with the fit of the model, need to be assessed. This usually involves examination of the deviations between predicted values based on the model and the observed values (usually called residuals). Trends and problems in the residuals are often identified with a series of plots.
Another consideration before the analysis can begin is that most statistical methods assume that the observations are independent. This assumption is violated for observations on the same person (or animal) and sometimes when the observations are clustered, such as for utilization data from patients treated by the same physician. In these cases, the statistical model used to analyze the data often includes one or more random effects or an adjustment can be made to the estimated variance using generalized estimating equations.
It is difficult to describe a generic statistical analysis since each set of data has its own unique issues. However, another issue that is also nearly universal in health related research is the occurrence of missing data. It is extremely important to remember that solutions depend on why the data are missing. Data that are missing completely at random (usually called MCAR) are easilyhandled but rarely occur (e.g. the research assistant's dog ate the data form). Data that are missing at random (MAR) are more common (i.e. the reason that the data are missing is not just chance but is unrelated to the outcome) and can also often be handled in a straightforward manner within many common analysis techniques or through multiple imputation methods. Data that are missing and the reason is non-ignorable present a more difficult problem. Solutions can range from pattern mixture models to computer intensive methods that attempt to model the missing data mechanism. Standard software is often not available for these situations and computer programs need to be written or modified for each analysis.
Before the introduction of powerful desktop computers, the implementation of complex statistical methods was a difficult and time consuming process. In that environment each assumption and every alternative was carefully considered before the analysis was run. Now that computing time is not usually a real impediment for even complicated statistical analyses, it is important to not abandon a thoughtful approach to data analysis. A predetermined plan should not be carved in stone. Each step should be implemented and the results examined before the next step can be finalized. Unexpected data issues can be discovered and then this process can lead down paths that may require the development of new statistical methodology . The initial data findings may also lead to additional secondary hypotheses that can sometimes be addressed by the data set but may also lead to additional secondary hypotheses that can sometimes be addressed by the data set but may generate ideas for a future study. Statistical analysis is not a lock-step algorithm but remains an art that is performed by scientists. It is also an ever evolving field with new techniques being published regularly. Therefore, it is important that most studies be analyzed by trained statisticians who are familiar with current practice and new developments.
Statistical software packages are readily available and many are user friendly. The package most widely used by statisticians is SAS. It is very powerful for manipulating data and contains the widest variety of readily available statistical techniques. These are accessed through procedures (called PROCs) and options within the system. In addition, newer methods are often available through "macros" that have been developed by users and distributed through the user group. Most of the SAS programming by the statistical analysis team is done by the masters-level biostatistician. SAS is a program based with extensive code needed to manipulate and analyze data. Efficient SAS programming is a skill that is not easily learned by the casual user. Another package that we commonly use is S-Plus which generally incorporates new techniques sooner than SAS and has some additional graphical capabilities. Like SAS, it also requires specific programming knowledge. Division biostatisticians also have access to a variety of other more specialized packages such as SUDAAN for the analysis of complex surveys, Solaris for multiple imputation, BUGS for Markov Chain Monte Carlo simulations as well as scientific subroutine libraries for FORTRAN and C+.
In the world of medical research, real data is a synonym for messy data. The messiness results from a wide variety of problems including missing data, complex correlation structures, wild distributions, and sampling biases. Often the existing literature does not include techniques that exactly fit a particular situation. It is the responsibility of the biostatistician to evaluate the situation and examine the sensitivity of the method to the assumptions that are violated and determine if it will be usable. In many cases, the method may be adequate as is or with minor modifications. The analysis can then proceed and the study results can be published. However, the analysis team, led by the faculty member may continue to work on the statistical problem to develop a better solution that can be used in the future. This may even lead to a grant application to do this methodological work with the biostatistician as the PI, particularly if it will be useful beyond that particular study. In other cases, when no viable solution exists, the methodological development will need to precede the analysis of the study. Faculty members work in a variety of methodological areas based on their interests, expertise and the real problems that they encounter in their collaborative work.
The responsibilities of the statistical analysis team do not end when the analyses are completed. A written summary of the methods and results is usually required as well as an oral presentation of those results to the principal investigator and the study team. The written report should include the methods used, the statistical results including key descriptive and inferential statistics, appropriate tables and graphs, and an explanation of the correct interpretation of the analysis. Parts of this report can often be included, with only a few changes, in manuscripts, posters and presentations.
It is difficult to list all the types of studies and analyses in which the Division of Biostatistics has been involved. The education and experience of the biostatisticians and data managers in the Division puts us in the unique position to handle all types of studies including randomized trials (single site or multi-center), phase 1 and 2 trials, logitudinal cohort studies, case control designs, multistage surveys, studies to evaluate diagnostic tests, outcomes research and meta analyses. This is far from an exhaustive list and the types of statistical techniques needed to analyze these studies would be far more extensive than the list of study types.