Study Design

Study Design

Stephen J. Gange

Elizabeth T. Golub


Epidemiologic methods constitute the scientific frameworks, concepts, and tools that are used for epidemiological evaluations. This chapter presents an overview of these methods in the context of evaluating the epidemiology of infectious diseases. As an organizational framework, it is useful to conceptualize epidemiologic methods as addressing key questions (Figure 3-1):

  • Who is to be studied? Epidemiologists study diseases in populations of individuals. While studies of entire populations may be of interest, typically only a sample of individuals contributes to an epidemiologic evaluation, selected according to specific study designs.

  • Which data will be collected? Observing the occurrence of diseases and their determinants requires measurement for purposes of constructing measures of disease occurrence.

  • Which inferences will be made from the analysis? Epidemiology, as the study of the distribution, determinants, and control of disease, is predicated on the fact that disease occurrence in humans does not occur at random. Epidemiologic evaluations play an important descriptive role in characterizing diseases among populations. When evaluating the determinants of diseases, epidemiologic evaluations typically require comparisons of populations for making causal inferences.

Figure 3-1 A Framework for Epidemiologic Methods

In infectious disease epidemiology, an intricate network of causal determinants influences the susceptibility to and development of disease. A useful framework for organizing these important determinants is the epidemiologic triangle (Figure 3-2), a diagram that emphasizes the interrelationship between three components:

  • Host. Human hosts differ in susceptibility to infections because of genetic, environmental, behavioral, and other characteristics. Major epidemic diseases, such as malaria, tuberculosis, smallpox, and plague, have led to selective genetic changes in human populations. For example, the evolution of several genetic mutations among Africans and Asians has resulted primarily from the selective pressure of hyperendemic malaria. Sickle hemoglobin, glucose-6-phosphate dehydrogenase deficiency, thalassemia, hemoglobin C, and hemoglobin E may be disadvantageous in homozygous individuals, but these traits have evolved in certain populations because they confer significant protection from malaria in heterozygous individuals.1

  • Agent. The agent constitutes the infecting pathogen (e.g., virus, bacterium, parasite, or fungus). Agents have certain characteristics that influence their infectivity. For example, one important characteristic of a pathogen is its escape mechanism, such as the evolution of resistance to antibiotics and antiviral
    therapies. This capability is recognized as a growing threat to public health and was presciently predicted in 1945 by the discoverer of penicillin, Sir Alexander Fleming, who said, “The greatest possibility of evil in selfmedication is the use of too-small doses, so that, instead of clearing up the infection, the microbes are educated to resist penicillin and a host of penicillin-fast organisms is bred out which can be passed on to other individuals.”2

  • Environment. The environment constitutes the setting in which transmission occurs. It is important to understand and characterize the environment in which transmission occurs and to be aware of environmental factors that may facilitate the agent’s survival or infectivity. It is not difficult to envision the role of environment for some agents, such as hookworm, where soil humidity, temperature, and other soil characteristics can influence the development of infectious Ancylostoma duodenalae larvae. However, the environment is also important in the transmission of airborne viruses, such as influenza and varicella, because it affects the length of time that the viral particles remain infective as an aerosol. The winter environment in temperate climates also facilitates transmission of influenza by bringing people indoors. Conversely, influenza epidemics have been interrupted by extreme cold weather that has forced schools to close, thereby interrupting transmission among children and introduction of the virus into the home.3

Figure 3-2 Epidemiologic Triangle (agent/host/environment)

By understanding these determinants and their interrelationship, we can identify and implement interventions for disease prevention and treatment. While the prescription and use of individual medications (e.g., antibiotics, antivirals, antifungals) may be

the natural “intervention” that comes to mind when thinking about infectious diseases, it is important to recognize that “interventions” actually include the vast breadth of public health measures, policies, and guidance that affect large populations (e.g., vaccination programs, chlorination of the water supply, social marketing).


Epidemiologists take a multifaceted approach to defining populations. First, populations are generally identified in terms of three basic characteristics: person, place, and time. Second, it is rarely possible to study the entire population about which we want to make inferences for a particular disease (the “target population”). Instead, we must identify a source from which study participants can be identified and then define a study population of those persons who will ultimately be included in the study. Finally, we must consider multiple factors that influence study populations, such as eligibility criteria, feasibility of enrollment, and the refusal rate among those invited to participate.

How Epidemiologists Describe Populations: Person, Place, and Time

The central focus of studying diseases in populations reflects an assumption that individual persons can be aggregated by some common characteristics. The population chosen for study ultimately depends on the purpose of the investigation. We usually describe populations in terms of factors that are well known to influence disease risk. Epidemiologists generally classify such factors as those that are related to person, place, and time.

Attributes of person include individual-level characteristics believed to influence disease. These
might include demographic characteristics (e.g., age, sex, race/ethnicity), socioeconomic characteristics (e.g., education, income), or biologic factors (genetics).

The description of place spans different geographical characteristics, which can be as broad as a continent, more mid-level such as a country or city, or as specific as a neighborhood. It might also include even more specific attributes, such as place of employment, patients in a certain clinic or hospital ward, or distance from a certain environmental site. The characteristics of place add to those of person in providing a specific description of our population; for example, we might describe our population as women ages 18-59 living in Baltimore, Maryland.

It is clear that defining a population using these two criteria alone may be insufficient. Continuing the preceding example, are we studying women in that age group who ever lived in Baltimore? To provide more specificity, we can include in our definition an aspect of time. Time, like the attributes of person and place, can be thought to influence disease. For example, it is reasonable to assume that a 25-yearold woman in Baltimore in 1882 had a much different disease risk profile than a woman of the same age living in Baltimore today. Likewise, time might refer to a scale other than calendar time—we might characterize individuals in terms of their life-course— although most commonly we choose calendar time as our parameter of time when describing a population.

Types of Populations: Target, Source, and Study Populations

Epidemiologists conceptualize several different types of populations (Figure 3-3). A target population comprises those individuals about whom we will want to make inferences based on the results of our study. Identifying a target population is often subjective because the group to which we want to make inference may be a conceptual construct and not a group of individuals who can be specifically enumerated. Ideally, this population is most relevant to the research question being investigated, in terms of its person, place, and time characteristics.

The target population serves as the background for the source population. A source population is a subset of the target population that can be enumerated and further studied. For example, in a study of the prevalence of sexually transmitted infections (STIs), we may wish to make inferences from the findings to a broad community and, therefore, identify as our target population sexually active men and women ages 18-49 living in the United States during 2005-2010. Of course, it would not be feasible to enumerate, or enroll into a study, all individuals meeting those criteria. Instead, we might identify a source population from which we can enroll study participants. One example of a source population for this study might be individuals attending specific STI clinics in Chicago or Baltimore during a certain period of time. There are, of course, alternative source populations that could be chosen to conduct a study whose results would be relevant to the same target population.

Figure 3-3 Populations Moving Through Time

A study population comprises those individuals in the source population who contribute data to an epidemiologic investigation. In some settings, the study population might be equal to an entire source population (or even to an entire target population). Whether a member of the source population ultimately becomes a study participant is influenced by a number of factors. Eligibility criteria are those characteristics that are necessary for an individual to be considered for enrollment, and may include attributes of person, place, and/or time. It may also not be feasible to study the entire source population (due to cost or logistics). Finally, some individuals in the source population may decline the invitation to participate.

In summary, identifying the characteristics of these different populations is an essential component of epidemiology. In the next section, we outline various descriptive and analytical designs that are used in epidemiology as applied to infectious diseases.


Epidemiologic studies of infectious diseases aim to evaluate the contributions of various factors in the transmission and acquisition of infectious pathogens, as well as those factors favoring endemic transmission and epidemics. The design of such studies must optimize the researcher’s ability to measure and evaluate the relationships between exposures and the occurrence of disease in the study population.

Studies of infectious disease can be designed to explore landmarks along the entire temporal process during which an individual is at risk, acquires infection, develops an infectious disease, or succumbs to it. The duration of this process can be short, such as with highly virulent infections (e.g., Ebola virus), or it can be very long, as with chronic infectious diseases such as HIV/AIDS. Epidemiologists strive to understand the population-level burden of disease, including the reasons for increased susceptibility of one population relative to another, the factors that affect the susceptibility of particular individuals in a population, and the factors leading to epidemics.

Several study designs are used to address research questions regarding the risk factors for, and burden of, disease in human populations. For example, descriptive designs are typically not initiated to make comparisons across populations but rather provide an opportunity to describe important characteristics of individuals with disease. Such descriptive designs include case reports, case series, and ecological and surveillance studies. In contrast, analytical designs are initiated to draw particular conclusions regarding the association between exposures and outcomes; they include cohort studies, case-control and other nested studies, and randomized clinical trials. Metaanalysis and systematic reviews, wherein either primary or published data from individual studies are systematically combined to investigate a research question, are also being conducted with increasing frequency. The optimal study design is a function of the hypothesis under investigation. In this section, we review several important and frequently used epidemiologic study designs and illustrate their use in evaluating infectious diseases.

Descriptive Study Designs

When a new disease is recognized, it may be of interest to describe the nature of the disease and to evaluate the probable means of transmission, reservoir, and natural history. Sometimes a new disease can be quickly linked to a specific organism, such as staphylococcal toxic shock syndrome. More often, however, epidemiologic studies contribute to the discovery and characterization of new pathogens, as with hantavirus pulmonary syndrome, Legionnaires’ disease, and AIDS.

Early studies may consist of descriptions of cases that may be linked by a route of transmission or common exposure. Descriptive studies do not typically make inferential comparisons of cases to individuals without disease (controls); rather, they only describe aspects of the disease and circumstances surrounding the acquisition and occurrence of disease. Surveillance methods capture cases of disease and are an excellent source for identifying individuals for further follow-up. At times, case reports or case series provide considerable insight into the epidemiology of an infectious disease.

Ecologic Studies

Ecologic studies utilize populations with different levels of exposure and examine the correlation of exposure levels with population-level disease frequency. In a typical ecologic study, data are not available at the individual level to determine whether those individuals who are truly exposed have a higher (or lower) occurrence of disease; the researcher simply knows that in the population with greater exposure, there is more (or less) disease.

Ecologic studies may be useful in exploring hypothesized associations by comparing disease frequencies among populations from different geographic regions or from different time periods. Population-level data may be available from national
or community-wide surveys of exposure frequencies and disease rates, which can often be obtained inexpensively. Ecologic studies also allow for comparisons where the range of exposure in one particular population may be too narrow to correlate with a disease outcome at the individual level. For example, the association of vitamin A deficiency with an infectious outcome would be difficult to evaluate in a population consisting of only vitamin A-deficient individuals. Alternatively, an ecologic study comparing infection outcomes across populations with varying prevalence of vitamin A deficiency would permit a better assessment of the correlation. Similarly, studies of the relationship between infectious agents and unusual outcomes—such as the liver fluke Opisthorchis viverrini and bile duct cancer, and Helicobacter pylori and stomach cancer—can be strengthened by ecologic data from populations with widely varying levels of infections and cancer.

Two ecologic studies, one of rheumatic fever and one of HIV infection, are described here.

Figure 3-6 The correlation between the incidence of rheumatic heart disease per 100,000 and the number of persons per room (×100), as found by Perry and Robers in various districts of the city of Bristol, England, in 1927-1930. (The size of the dots indicates roughly the comparative population size of the districts.) Reproduced from E. Kass. Infectious Diseases and Social. Change. Journal of Infectious Diseases, Vol. 23(1):110-114. © 1971. By permission of Oxford University Press.

Crowding and Rheumatic Fever Early studies led to the hypothesis that household crowding was an important environmental factor in the transmission of group A streptococci and high rates of acute rheumatic fever. Moreover, it has been hypothesized that the reduction in household crowding may have been one factor leading to the decreased rates of acute rheumatic fever in the last half of the 1900s in comparison with earlier periods.20 The data in Figure 3-6 show the association between the incidence of rheumatic heart disease and crowding (as measured by household size) in various districts in the city of Bristol, England, 1927-1930. Compared to districts with high household crowding, those with low crowding show lower rates of disease.

Circumcision and HIV Transmission Male circumcision (removal of the foreskin) is a common surgical procedure undertaken for a variety of cultural and medical reasons. Biologically, the foreskin is rich in immune cells and may develop micro-tears that may
serve as an entry point for HIV. The foreskin may also trap HIV in a warm moist environment, allowing more time for infection to occur. Given these factors, it is not surprising that circumcised men have been found to have lower rates of sexually transmitted diseases.21

In the late 1990s, data began to emerge suggesting that circumcised men were at lower risk for HIV infection. An ecologic study contributed to this evidence by examining the association of the prevalence of circumcision and HIV in several African countries.22 Data on circumcision practices were extracted from an ethnographic database and were combined with published HIV seroprevalence data. By mapping these data, the authors identified a strong correlation between the practice of male circumcision and the prevalence of HIV infection among males (Figure 3-7).

The challenge in conducting this analysis was that a variety of behavioral, cultural, and religious

differences between ethnic groups may alter the risk of HIV acquisition. Most notably, circumcised men in the study were more likely to be Muslim, and it was possible that behavioral factors may have contributed to their lower risk of infection. As noted by Gray:

Figure 3-7 Map of Africa showing political boundaries and usual male circumcision practice, with point estimates of general adult population HIV seroprevalence superimposed. Reproduced from Moses et al., Geographical Patterns of Male Circumcision Practices in Africa: Association with HIV Seroprevalence. International Journal of Epidemiology, Vol. 19, pp. 693-697. © 1990. By permission of Oxford University Press.

[M]arried Muslim men are predominantly polygamous, and polygamous unions may provide a closed sexual network reducing the risk of HIV introduction. Also, Muslim men abstain from alcohol consumption, and alcohol is associated with high-risk behaviors. Key informant interviews suggest that penile hygiene may be important. Under Islam, individuals are considered unclean after intercourse, and Muslim men and women are required to perform post-coital ablutions. In addition, observant Muslims will often wash before daily prayer. Hygienic practices
associated with religion may thus partly explain the protective effects of circumcision among Muslims.23

Because an ecologic study design does not collect individual-level data, it cannot account for differences in cultural or hygienic practices that may differ between those men who are and are not circumcised.

Based on the strength of the ecologic studies and other emerging data, three randomized clinical trials of male circumcision were initiated in 2001 in Kenya, South Africa, and Uganda. The results of these studies demonstrated convincingly that male circumcision reduced the incidence of HIV acquisition by more than 50%.24, 25 and 26 Although no benefits were seen for circumcision of HIV-infected men in protecting against transmission to their female partners,27 additional studies have demonstrated benefits of circumcision for genital ulcer disease28 and high-risk human papillomavirus.29 The next generation of studies will need to evaluate the expansion of circumcision as part of national HIV prevention strategies and the impact on regional HIV incidence—again requiring further ecologic designs.

Analytical Study Designs

Analytical study designs are fundamental tools for epidemiological inference. In contrast to descriptive study designs, analytical studies collect individual-level data and compare the occurrence of disease with exposure. These designs are best described in the context of a study population moving through time, as illustrated in Figure 3-3, which emphasizes several interrelated methodological considerations:

  • Selection of the study population from the target and source populations. As noted previously, there are usually individuals who are part of the target population but who are outside the study population. Thus the manner in which individuals are selected (or self-select) for participation into the study population is an important consideration in evaluating the inferences of a study.

  • Determination of time metric and follow-up. The choice of an appropriate metric for conceptualizing the study population moving through time is an important design consideration. Typically, studies are described in terms of the calendar time during which individuals are enrolled and followed. Keep in mind, however, that time can be defined by any number of measures, such as chronological age, biological life stages (e.g., before or after menopause), or other events (e.g., jobs, marriage, retirement).30

Each line in Figure 3-3 represents the time when an individual begins and ends his or her time at risk for experiencing the disease outcome. Determining whether an individual is at risk incorporates both biological and methodological considerations. For example, individuals vaccinated against Morbillivirus will not be susceptible to measles and, therefore, cannot be considered at risk for this outcome. Furthermore, if an individual moves out of a study catchment area, he or she would also no longer be considered at risk (if the event occurred, it would not be recorded by the study). Depending on the time metric of interest, not all study participants might enter into a study at the same time. For example, if “time” is measured by age, then participants might enter at different ages, even if they are all enrolled during the same calendar period.

The importance of a well-defined and relevant time metric becomes more evident when we are thinking about the group of individuals who are at risk for an event at a particular time point—the risk set. As we will discuss in more detail later, risk sets are important in both the design and the analysis of epidemiologic studies.

  • Exposure assessment. In the simplest studies evaluating the link between a particular exposure (e.g., “exposed” and “unexposed,” illustrated as solid and dashed lines, respectively, in Figure 3-3) and outcome, it is important to determine whether and how exposure may change over time. Some exposures may be unchanging (“fixed”) within an individual (e.g., genetics). Others may be time varying but may change in different ways. For example, some infectious exposures may be transitory and recurrent (e.g., influenza), whereas others are persistent and lifelong once acquired (e.g., HIV). When exposures can change, an important aspect of the study design is the time when they are assessed. It is vital that they are measured at a relevant time point so that they can be temporally linked to a disease outcome.

  • Outcome assessment. At the end of the period at risk, some individuals in the study population may develop the disease of interest (Figure 3-3, circles). Like exposures, some disease outcomes may be transient and recur; others are lifelong or defined that way in the analysis (e.g., defining a disease outcome as the first occurrence).

Furthermore, a study will usually end before all individuals develop disease; the term we use for those individuals whose period of follow-up ends while they are still disease-free and at risk is “censored.” Thus we need to think about the different types of censoring that may occur within our study design. Individuals may be censored due to logistics of cutting off their follow-up time for an analysis (e.g., administrative censoring). Alternatively, censored individuals may include those who drop out or otherwise become lost to follow-up during the course of the study.

In this section, we provide a brief survey of the various study designs used in epidemiologic analysis, highlight key issues for each design in light of these four methodologic characteristics, and provide examples in the context of infectious disease studies.

Randomized Clinical Trials

Clinical trials evaluate the effect of planned interventions in an experimental manner. The investigator assigns certain participants to receive one treatment— the experimental group—and others to receive another treatment—the control or comparison group. In this subsection, we highlight several key methodological issues in clinical trial designs.

Selection of the Study Population from the Target & Source Populations Clinical trials are initiated to make inferences about an intervention among a specific target population. Thus the eligibility criteria that apply to the study population are a key element of the design. Some trials aim to produce widely generalizable results, exemplified by the advocacy for “large simple trials.”31 With this design, study investigators impose few specific eligibility criteria, with the goal of making inferences about a large and diverse population. More typically, however, clinical trials incorporate strict eligibility criteria, with the goal of eliminating potential factors that might obscure the evaluation of the safety and efficacy of an intervention. This effort would include identifying participants who will be compliant with the study protocol and are either healthy enough to gain some benefit from the intervention or more ill with fewer clinical options.

The decision to impose strict eligibility criteria may particularly affect the enrollment of minorities, women, and persons who have other comorbidities. While not limited to studies of infectious diseases, several studies have documented these effects in HIV disease.32, 33 These disparities remain despite evidence that minorities are willing to participate in research studies.34, 35

Determination of Time Metric and Follow-up The natural time origin in a clinical trial is the date of randomization, and the time elapsed since randomization is the natural metric for measuring followup. The study protocol will usually specify the target minimal time for follow-up of the primary endpoint after which data are analyzed—for example, the trial may continue until all patients have a minimum follow-up of 5 years. Completeness of follow-up is a vital part of clinical trials to ensure the results are not subject to selection bias. The Consolidated Standards of Reporting Trials (CONSORT) Group has developed a variety of initiatives and recommendations that address problems arising from inadequate reporting of randomized controlled trials (RCTs).36

Many studies conduct planned interim analyses, with the possibility of stopping a trial early for futility or if strong safety and/or efficacy signals are observed. A substantial number of statistical issues arise with interim analyses, and careful planning and adaptation of specialized methods are necessary to ensure the study maintains its statistical integrity.37 Further, the decision to stop a study early requires the study team, usually in collaboration with an independent data safety and monitoring board, to balance a variety of complex considerations.38, 39

Exposure Assessment In an RCT, the exposure of interest (i.e., the intervention) is randomized and administered to participants according to a prespecified study protocol. In the classic double-masked (or double-blinded) trial, subjects are assigned by a random procedure to receive either an experimental treatment or placebo, and neither the subject nor the investigator knows which treatment the subject is receiving. Under some circumstances, this type of trial is not possible, for a variety of reasons. It may not always be possible to conceal the treatment group from the trial participants or the investigators. For instance, trials of medical procedures may be obvious, or medications may have certain side effects. Also, there may be times where a suitable placebo is not available.

In an ideal setting, all individuals would receive and adhere to the intervention to which they were randomized. However, issues of crossovers between treatment groups and adherence are important and raise a key consideration of whether individuals should be analyzed as they are randomized (“intention to treat” analysis) or as they actually use the treatment (“as-treated” analysis). Describing the crossover and
adherence in a standardized manner is aided by guidelines such as the CONSORT statement.

Only gold members can continue reading. Log In or Register to continue

Jul 8, 2016 | Posted by in INFECTIOUS DISEASE | Comments Off on Study Design
Premium Wordpress Themes by UFO Themes