There are already several approaches available to increase the chances of diagnosing breast cancer at an early, curable stage, or of reducing the chances of getting breast cancer at all. Some examples include enhanced surveillance with periodic breast MRI, chemoprevention, or even prophylactic surgery. These interventions cannot be applied widely and indiscriminately because each exacts a cost of its own—be it financial, physiological, psychological, or social. The risks and benefits of these interventions must be considered on a case-by-case basis and this requires some sense of the balance between the risks of the proposed intervention and the risks of the disease it is trying to address.
It is well known that breast cancer can run in families, and it is not uncommon for a woman whose mother has died of breast cancer to suffer a great deal of anxiety about her own “impending” breast cancer, and even to undertake extraordinary measures, including prophylactic mastectomy, to prevent it. Individuals with a family history of breast cancer frequently overestimate their own chances of developing the disease and often experience a sense of relief when presented with quantitative information suggesting that their actual risk is quite a bit lower than they would have imagined.1,2 Conversely, a healthy woman whose recent breast biopsy has diagnosed high-risk preneoplasia, such as atypical hyperplasia, may be spurred to effective action when presented with a quantitative estimation of her breast cancer risk over time.
Quantitative breast cancer risk assessment is also an integral component of prevention research. It can be used as a surrogate for breast cancer incidence in studies evaluating biomarkers of breast cancer risk,3-5 and is always considered in the inclusion criteria for trials that include breast cancer incidence as an end point.6 The value of quantitative risk assessment in the later case cannot be overstated as these calculations permit accurate estimation of the number of end-point events that will occur. This is critical for designing the most efficient study possible.
Finally, third-party payors have begun to consider quantitative risk assessment data in their determinations of medical necessity. This is best illustrated by fairly wide adoption of recently published American Cancer Society guidelines that support screening breast MRI for women with a lifetime risk of breast cancer 20% or more.7 The best approach for incorporating quantitative risk assessment into medical decision-making can and should be debated, but at this juncture it is reasonable to ask how one goes about estimating breast cancer risk, and whether such estimations are accurate.
A variety of metrics are available to quantify risk and each is valuable when used in the appropriate context. Relative risk is often used to identify new risk factors from case-control data and is sometimes used in calculations of absolute risk. Simply stated, relative risk expresses the strength of association between exposure to a risk factor and the presence of breast cancer. For example, the relative risk for breast cancer conferred by atypical hyperplasia is about 4.5.8 This means that women with atypical hyperplasia develop breast cancer about 4.5 times more frequently than similar women without atypical hyperplasia. For a disease like breast cancer, which is fairly common, a high relative risk usually translates into a high absolute risk, but the concept of relative risk is foreign to most patients. Relative risk is invaluable for clinical scientists but not particularly useful in the risk-assessment clinic.
Absolute risk is the percent chance that some event will happen over some specified time. It can be observed directly and prospectively by following a cohort of individuals over time, or can be estimated from disease rates measured during a single time period in a defined population. An example of the later approach is the age-specific breast cancer risk figures supplied by the NCI-sponsored Surveillance Epidemiology and End-Points (SEER) project.9 Table IV-14 of the 1995 to 2005 SEER data shows breast cancer risk by age according to current and historical age-specific breast cancer incidence rates (Table 9-1). A quick look at this table shows that the risk of ever developing breast cancer decreases as a woman ages, but that the near-term risk (10-year risk in this case) increases as a woman ages. This is important to consider as we decide how best to apply quantitative risk calculations in medical decision-making. For example, the lifetime breast cancer risk for a 20-year-old woman is 12.2%, but her near-term (10-year) risk is only 0.05%. Should intervention decisions be made based on her lifetime or near-term risk? Historically, it is near-term (5-year) risk that has been used to select women for participation in chemoprevention trials, and the threshold used to classify a woman as high risk has traditionally been 1.67%. This is simply the 5-year risk for a 60-year-old Caucasian woman calculated from the 1984 to 1988 SEER data. This threshold may be useful for ensuring that there are sufficient end-point events in a chemoprevention trial, but it is meaningless in the breast cancer risk-assessment clinic. This SEER table also tells us that, for a female born today, the risk of ever developing breast cancer in her lifetime is 12%, or 1 in 8. This is a useful metric for tracking breast cancer incidence trends over time, but is only the starting point for quantitative breast cancer risk assessment, which attempts to adjust this value based on the unique combination of risk factors operative in a specific individual.
Absolute risk can also be calculated from relative risk if certain additional information is available. In order to convert relative risk to absolute risk one must know how frequently each category of risk occurs in the population of interest and the rates of death from other causes. To be useful in the clinic this information must be known for women who are as much like the patient we are seeing as is possible (eg, same age, same race). In addition, many risk factors are not totally independent—that is, the presence of some risk factors will modulate the effects of other risk factors. A model that calculates absolute risk from several different relative risks must consider these interactions. The Gail model is a particularly good example in this regard. This model starts with 5 separate relative risks, but accounts for the way that age and number of biopsies can interact, and for the way that age at first live birth and family history interact.10 These 5 relative risks are combined into 3 relative risks, and then SEER incidence data, a factor to account for the prevalence of the risk factors by race, and U.S. mortality data are combined to translate these relative risks into absolute risks.6
Cancer risk models use personal and family history information to calculate the probability that an individual carries a deleterious mutation in a cancer predisposition gene or to estimate the probability that the individual will develop cancer over time. Most quantitative risk assessment models are empiric models—that is, models that use various weighting schemes to mathematically represent an observational dataset. The Ontario Family History Assessment Tool (FHAT) is one example of a simple empiric model developed to identify women who should be referred to a genetic counselor.11 This model was developed by having 16 clinical geneticists review 26 hypothetical pedigrees. It assigns points to a family history based on the types of cancer present, the ages at onset, and the relationship of cancer patients to the proband.
Logistic regression is commonly used to develop mathematical equations that accurately describe a set of observations that have been made in a group of individuals. A very simple example is the Finnish BRCA gene mutation prediction model, which was developed using data from 148 BRCA gene mutation-tested families with a strong history of breast and/or ovarian cancer.12 This model uses the earliest age at breast cancer diagnosis and the number of ovarian cancers in the family to calculate the probability that the family carries a deleterious mutation in BRCA1 or BRCA2. This is the most austere logistic regression model, but also one of the most intelligent as these two variables form the core of nearly every other BRCA gene mutation prediction model. The Gail model is, perhaps, the best known and most widely used logistic regression model for calculating breast cancer risk.10 This model was developed from a case-control study that included nearly 6000 women who had participated in the Breast Cancer Detection Demonstration Project. Dozens of risk factors were evaluated to identify the smallest number of factors providing the best risk prediction. Risk factors included in the final model were age, age at menarche, age at first live birth, number of biopsies, any history of atypical hyperplasia, and number of first-degree female relatives with breast cancer. The initial model was applicable to Caucasian women only and included both invasive and in situ cancer as end points. In preparing for the P1 Tamoxifen Breast Cancer Prevention Trial (BCPT), investigators with the National Surgical Adjuvant Breast and Bowel Project (NSABP) modified the original Gail model to include African American women and to limit the probability calculation to invasive breast cancer only.6 This is the model in common use today and is known as the modified Gail model or Gail2 model. Gail and colleagues have since described a revised model that excludes age at menarche but adds body weight and mammographic density.13 Barlow and colleagues used data from 1,007,600 women enrolled in the Breast Cancer Screening Consortium to develop separate logistic regression models for premenopausal and postmenopausal women.14 The model for premenopausal women includes age, prior breast surgery, family history of breast cancer in first-degree relatives, and mammographic density. The model for postmenopausal women includes these same factors with the addition of ethnicity, race, body mass index, age at first live birth, hormone therapy use, history of oophorectomy, and history of a “false-positive” mammogram in the prior year. Interaction between risk factors was not considered in this model.
The Claus model is another popular empiric model that grew out of efforts to find the best way to mathematically describe breast cancer family history patterns among the more than 9000 women participating in the Cancer and Steroid Hormone Study.15 Several approaches were tried, but the model that provided the best fit to the data was one that postulated the existence of a rare autosomal dominant susceptibility gene in the population. This predated the discovery of BRCA1 by several years. The variables considered by this model are the ages at breast cancer diagnosis and the relationship of affected relatives to the proband. The model calculates the probability of developing breast cancer over time.
Mendelian models represent another major class of quantitative risk assessment models. Unlike the empiric models, which are directly derived from specific observational data, Mendelian models are based on the general rules of Mendelian inheritance. Observational data contribute to the construction of these models, but only for the purposes of estimating the allelic frequency and penetrance of the genes of interest. BRCAPRO is a Mendelian model that uses Bayes theorem to calculate the probability that an individual carries a mutation in a major autosomal dominant breast cancer susceptibility gene based on the family history of breast and ovarian cancer.16,17 The probability that the individual will develop breast or ovarian cancer over time is then derived from this mutation probability based on age-specific incidence curves for mutation carriers and noncarriers. This model requires entry of a complete family history including all affected and unaffected relatives. Calculations are based on the ages at cancer diagnosis, the relationship of affected and unaffected relatives to the proband, results for any genetic testing that has been done, history of oophorectomy, and the characteristics (receptor status) of any primary breast cancers. Other Mendelian models, such as the Jonker model18 and BOADICEA,19 calculate the probability of a mutation in a major autosomal dominant breast cancer susceptibility gene, but do not provide an estimation of breast cancer risk over time.
The Tyrer–Cuzick model is an attempted synthesis of empiric and Mendelian approaches to risk modeling.20 It first uses family history information to calculate the probability that an individual carries a mutation in an autosomal dominant breast cancer susceptibility gene and then calculates the risk of breast cancer based on this. This risk estimate is then adjusted based on age at menarche, parity, age at first live birth, age at menopause, history of atypical hyperplasia or lobular carcinoma in situ (LCIS), height, and body mass index.