Grant/Protocol Development
We are committed to assisting you as you develop your grant/protocol. See
Role of Biostatistics for the many ways in which we can contribute to your
research goals. Although you may be collaborating with several members of the division,
your primary contact should be the lead statistician assigned to work with you.
Over the years, we have established relationships with many of the research groups
on campus. To see if there is someone already designated to assist you, please refer
to the document Contact Information
Although certainly not inclusive, this list contains many of the major project areas
we are involved in.
If you do not feel your project fits into any of the applied areas outlined in the
Contact Information section, please
contact Dr. Barry Katz (bkatz@iupui.edu, 274-2674)
or George Eckert (geckert@iupui.edu, 274-2884)
for further assistance.
The primary information we need to begin the process of developing the data management,
sample size justification, and analysis plan for the proposed grant/protocol is
a draft of the grant/protocol itself. The Specific Aims and Research Design and
Methods sections are of utmost importance, but having the Background and Significance
and Preliminary Studies sections are also important because they can provide valuable
information for developing both the sample size justification and overall analysis
plan.
A complete grant application/protocol needs to have a comprehensive
data management plan that describes how the data will be collected, stored,
manipulated and processed. The data management plan should include details about
the choice of software, description of the data entry systems, proposed quality
control methods and security precautions.
The data management team which is often identified during the proposal stage, will
generally include a primary data manager and a database technician. For larger proposals
a second data manager and a data management assistant may also be included. If the
Division is responsible for any data entry, a data entry clerk will also be part
of the data management team. The primary data manager is responsible for all aspects
of data management, and the database technician will assist in programming, verification
and printing tasks. Data entry clerks are responsible for entry, verification and
filing of data.
A complete grant application/protocol needs to have a sample
size justification and a thorough
analysis plan that tests the hypotheses presented in the specific aims
and includes choice of technique , validation of assumptions
, and alternative plans when the assumptions are violated. This may even require
some mention of new methodological development .
External review committees will evaluate the appropriateness of these sections but
so will internal review panels like the IRB and Animal Care Committee and also specific
scientific review groups like those associated with the General Clinical Research
Center (GCRC) and the Cancer Center.
The analysis team is usually identified during the proposal stage. Large research
projects will generally involve a statistical team made up of a biostatistics faculty
member and a masters-level staff biostatistician. It is not unusual for the analysis
team of a center, program project or large data oriented study to contain a second
faculty member and/or staff biostatistician. Also, the team may include a faculty
member with special expertise in a particular statistical area that is required
for the project. A senior staff biostatistician might handle smaller projects independently.
Occasionally, the team might only include the faculty member when the analysis role
is less intense but additional input on the final design or the choice of techniques
is needed during the project.
The design of the research study is a key part of the collaborative process. Clearly,
the design must produce data that can be used to meet all of the aims and test all
of the hypotheses in the study. Most projects require simple straightforward designs
that are often arrived at early in the process. Yet, even for these, a statistical
approach may serve to minimize the required sample through matching or adjustment
for important demographic or clinical factors. The protocol might also be streamlined
in other ways such as reducing the length of the study or the number of data collection
points. It is impossible to all of the factors that might be considered for every
study, but some general examples follow. In epidemiological studies, the prevalence
of a disease or condition is often a determining factor when choosing between a
cohort and case control study design. However, the strengths and weaknesses of each
must be carefully considered as well. Clinical databases on campus allow us to also
consider a third choice when the needed data are collected routinely, retrospective
cohort studies. For randomized clinical trials, the randomization plan is often
a key element and the need for a multi-center trial often hinges on issues of statistical
power. In some cases, where subjects are few and intervention effects transitory,
a crossover design can be used. Finally, even in laboratory based studies, efficient
designs can save both effort and resources. For example, when multiple factors are
being studied and higher order interactions can be ignored, a fractional factorial
design can greatly reduce the needed sample size while preserving statistical power
for the tests of key hypotheses. In the design stage, it is never too soon to begin
collaborating with a biostatistician.
All studies should have a justification for the sample size to be used, whether
the samples are people, rats, cell cultures or any other experimental unit. In some
cases a review committee will accept historical precendent (i.e. I always use 6
per group, my mentor used 6 per group and his mentor used 6 per group), but a statistical
justification based on the power of analysis to detect a significantly important
difference is certainly preferable and is now being required by more and more funding
agencies. This is certainly understandable since sample sizes that are too large
waste resources, and those that are too small lead to inconclusive studies. For
simple studies with continuous outcomes, calculations depend on the expected means
(or the minimally important difference between means) and the standard deviation/variance.
For comparing proportions, only the expected proportions are needed. These data
are generally available from pilot data or past studies with which the investigator
was involved or can often be obtained from the literature. When those quantities
are available, sample size estimates can be easily calculated. Similarly, calculations
for studies to detect linear association using correlation coefficients require
only the magnitude of the correlation coefficient to be detected. Links to some
simple sample size and power web-based calculators are
http://calculators.stat.ucla.edu/ or
http://www.stat.uiowa.edu/%7Erlenth/Power/.Generally, at least 80% power,
and sometimes 90%, is needed for a well designed study.
More complicated studies generally require specialized software and additional biostatistical
input to calculate sample size. Studies involving multiple regression models (linear
or logistic) usually require additional knowledge about the strength of the associations
of all the factors with the outcome and with each other. Similarly, calculations
for survival analysis require knowledge of the distribution of deaths (or failures)
over time and the pattern of censoring. Correlated data often result from multiple
observations on the same individual. This happens over time in longitudinal studies
and also within the same time point, such as measures on two eyes in ophthalmology
or multiple teeth in dental studies. Clearly, data from some studies have both of
these issues and some also have correlations among individuals who are clustered
(e.g. treated by the same physician). Knowledge of the strength of these correlations
is needed to calculate power and these estimates usually come from past data as
well. Even armed with all of these estimates, direct methods for estimating power
may not exist for a particular method and the only solution may be to simulate data
under the expected conditions to calculate the needed sample size. More lead time
is needed by the biostatistician in these cases to write the programs. Sometimes
a situation seems simple but there are hidden statistical issues so it is always
a good idea to discuss the sample size with the satistical analysis
team .
The critical issue in choosing the statistical methods to be used in the analysis
plan is whether the specific aims are fulfilled and the key hypotheses tested. Once
these scientific questions and statements are translated into statistical hypotheses,
other data related issues can be considered. The initial consideration is the distribution
of the outcome measure. Data that are dichotomous or categorical should be analyzed
with methods designed for categorical data. Similarly, "classical" methods are generally
appropriate for data that are approximately normal or for moderate to large samples
where the means are approximately normal. In some cases, like survival data with
censored observations or counts, specialized modeling techniques are required. Most
statistical methods assume that the observations are independent. This assumption
is violated for observations on the same person (or animal) and sometimes when observations
are clustered, such as for utilization data from patients treated by the same physician.
In these cases, mixed effects models or generalized estimating equations are often
used. Since many studies have multiple outcome measures or subgroups of interest,
another issue that often needs to be addressed is the effect of multiple tests on
Type I error (i.e. the chance of falsely rejecting the null hypothesis). A final
issue that is also nearly universal is the handling of missing data. Solutions depend
on why the data are missing but they can range from just using the data that exist
to multiple imputation methods to computer intensive methods that model the missing
data mechanism.
A complete data analysis plan must also include methods for verifying the assumptions
needed to implement the statistical techniques that were proposed. These usually
involve graphical and descriptive methods for assessing distributions, equal variances
and independence of observations. These methods are usually implemented on the original
data and on the residuals from the statistical model. The plan should also outline
related methods for assessing the fit of the statistical models that are being used.
Alternative methods to be used, when assumptions do not hold, should be included.
Sometimes a protocol or proposal requires the devlopment of new statistical methods
or extensions of existing methods to new situations. This occurs when the issues
to be considered in the choice of statistical techniques
leads to situations where no appropriate methods exist or as an alternative method
when the assessment of statistical assumptions reveals
a problem. In either case, the application must propose a new method or extension
but the details will often need to be completed as the data are collected and fully
examined. Nevertheless, the proposal must contain enough information to give reviewers
confidence that the new method can be developed. A track record in methodological
research by members of the biostatistical team is extremely helpful in these situations.
When the analysis plan involves methods beyond commonly used techniques, it is often
important to also describe the software that will be used to implement these methods.
Although we use SAS for most of our analyses and it contains a wide array of techniques,
sometimes other software packages may be necessary. SPLUS is also used frequently
since it often implements algorithms for new methods sooner than SAS. Specific techniques
are also often handled by specialized programs like SOLAS for missing data analysis
or BUGS for Markov Chain Monte Carlo methods, which can be computationally intensive.
Basic background information we must have in order to prepare a budget estimate:
- Draft of Grant/Protocol
- Project Title
- Primary Investigator (with phone number and email address)
- Funding Agency
- Submission Date
- Funding Date
- Length of Funding
- Budget Contact (with phone number and email address)
- Cancer Center Membership (yes or no)
Other information that we may need depending on the situation or that would be of
tremendous benefit in terms of obtaining the most accurate estimate possible is:
- Drafts of all data collection tools
- Approximate number of interviewers/data collectors (if applicable)
- Ballpark estimate of how much money/percent of time will be available for Biostatistics
services
|