Plain Language Summary
Breast cancer affects millions of women worldwide. Screening mammography has the potential to detect breast cancer early, leading to more effective treatment, reduced chance of metastasis, and better survival and quality of life for the patient. Large trials performed in the 1970s and 1980s showed that breast cancer mortality was lower in women who were invited for mammography screening. This led to widespread implementation of breast cancer screening, predominantly in Western Europe and North America. However, the implementation of screening introduced harms in addition to benefits. The major harm of breast cancer screening is overdiagnosis, the detection of a breast cancer that would never have become symptomatic during a woman’s lifetime in the absence of screening. False-positive test results, a positive screening mammography result in a woman who does not have breast cancer, are the most common harm. Over recent decades, a debate has been ongoing as to whether the benefits of breast cancer screening outweigh the harms. In an effort to guide decision-making, a number of organizations have summarized the available evidence in reviews, balance sheets, and screening recommendations, aiming to help women and their physicians make an informed choice about screening. Despite the fact that all reviews are based on evidence from similar sets of trials and observational studies, some have concluded that screening should be stopped, while others recommend continuation of screening activities. In this chapter, we show that the differences between these reviews are at least partly related to choices about which studies to include, which screening strategies to consider, and how screening harms and benefits are defined. One of the challenges is that estimates of breast cancer mortality reduction due to screening are generally based on “old trials” and do not consider more recent observational studies that have evaluated mortality reductions using data from modern screening programs. Harms, on the other hand, are almost exclusively assessed based on today’s screening practice. Another factor that complicates the comparability of the reviews is that the benefits and harms are not expressed in the same way, using different measures and populations. Finally, there are differences across countries in the organization of health care, cultural factors, and medicolegal considerations that shift the relative balance of harms and benefits. This is best illustrated in the large difference in the risk of a false-positive screening result, which is much higher in the United States compared to Europe. Future evaluations of the benefits and harms of breast cancer screening should base estimates of breast cancer mortality, overdiagnosis, and false-positives on the same screening setting, including time period, age, screening test, screening frequency, and organization of screening. If the aim is to compare the balance of harms and benefits between countries, it is important to ensure that the balances are indeed comparable. Finally, those creating or applying balance sheets should be aware that additional challenges lie ahead with the implementation of new screening tests and more personalized screening strategies. All of these are likely to affect the current balance of benefits and harms.
Introduction
The primary aim of breast cancer screening is to reduce mortality from the disease, but it is well-understood that screening does harm as well as good. Mammography is the preferred screening test for early detection of breast cancer and has been studied in more than 600,000 women in 11 randomized trials over the past 50 years. Over the past several decades, mammographic breast cancer screening has been the subject of controversy, with some questioning whether the benefit in terms of mortality reduction is large enough to justify the recognized harms of screening, in particular overdiagnosis. Others reviewing essentially the same accumulated evidence have concluded that the pros outweigh the cons.
In light of this debate, this chapter focuses on reviews that evaluate the balance of screening mammography benefits and harms that have been used to guide decision-making or provide recommendations for breast cancer screening. We have selected the reviews to represent evidence from different settings where screening practices vary, but acknowledge that many more reviews could potentially have been included. In this sense, the selected reviews serve as examples to illustrate how researchers and decision-makers reach conclusions on the balance of benefits and harms. This provides the background for a discussion of the methodologies used to determine this information and present results and conclusions.
The balance of the benefits and harms of a breast screening program can be communicated in a number of ways and to a variety of audiences. The format depends on the purpose of the balance and the intended audience. In this chapter, the focus is on benefit/harm balance sheets as presented in the scientific literature rather than communication to individual women or decision-makers. However, these scientific balance sheets should serve as the primary source of information for estimates communicated to women in the target population, health professionals and policy-makers. Outcome measures may be chosen according to the purpose of communication, but by necessity should be based on the same evidence. However, recommendations based on the same evidence may still differ, depending on the relative weight that is placed on different outcomes.
Benefits
Breast Cancer Mortality Reduction
Screening mammography aims to reduce breast cancer mortality through detection and treatment of tumors at an early stage, leading to better survival than symptomatically detected tumors. As such, there is agreement across all reviews presenting a harm/benefit balance sheet on relative or absolute reductions in breast cancer mortality as the main benefit of screening. Nevertheless, some authors have suggested adopting all-cancer or all-cause mortality as the main outcome measure in order to avoid overestimation of the benefit due to bias in cause-of-death classification. Several studies have explicitly assessed the quality of cause-of-death determination in relation to mammographic screening and have found no significant evidence of bias. Further, the randomized trials of screening mammography were not designed to estimate overall or all-cancer mortality and were thus not powered to adequately estimate these outcome measures. Absence of evidence for an effect on these outcomes can thus not be taken to indicate evidence for absence of an effect. Estimates of screening benefit from randomized trials and observational studies are further detailed in Estimates of Screening Benefit: The Randomized Trials of Breast Cancer Screening, The Importance of Observational Evidence to Estimate and Monitor Mortality Reduction From Current Breast Cancer Screening respectively.
Effects of screening on breast cancer mortality can be quantified as lives saved or life-years saved. Evidence reviews typically translate decreases in breast cancer mortality risk into absolute effects in terms of lives saved. However, since absolute risk is higher among older women, most of the benefit accrues at older ages in which the total number of life-years saved may be relatively small. Life-years saved attributable to breast cancer screening have been reported by some cost-effectiveness analyses, but are not generally included in evidence reviews.
Other Benefits
Early detection of breast cancer only confers benefit if it is followed by appropriate treatment, resulting in a more favorable outcome than would have been achieved had the treatment been given later in the course of disease or not at all. Early detection and treatment are also expected to improve quality of life for the women diagnosed, since less invasive procedures are more likely to be an option when the tumor is detected at a more favorable stage, for example, breast conserving surgery as opposed to mastectomy. Thus the majority of women participating in screening do not experience a benefit, and even those with a screen-detected cancer benefit only if earlier treatment leads to reduced morbidity and mortality. Although improved quality of life is expected to result from screening for some women, this benefit is not typically considered in balance sheets.
Since the ultimate impact of service screening on breast cancer mortality is inevitably long term, there are several indicators that can assess the performance of a screening program in its early phases and that can also be used to predict whether an effect on breast cancer mortality is likely. These early benefits are mostly related to the stage shift that is introduced with screening and is expected to result in a higher rate of small cancers and a lower rate of advanced cancers. Although these measures are intuitive, there are many methodological challenges associated with their definition and interpretation. Moreover, assumptions need to be made in order to estimate their consequent effect on breast cancer mortality. The latter may explain why these outcomes are not usually considered in the evidence reviews.
Harms
Overdiagnosis and Overtreatment
Overdiagnosis, and resulting overtreatment, is regarded as the major harm of breast cancer screening. In cancer screening, overdiagnosis is defined as the detection of cancers that would not present symptomatically during one’s lifetime in the absence of screening. Although there is ongoing research focusing on identification of overdiagnosed tumors, at present the extent of overdiagnosis can only be estimated on the population level by comparing breast cancer incidence in the presence and absence of screening or by using simulation models.
Overdiagnosis is harmful in two major ways. The first harm is simply due to the unnecessary detection of a breast cancer. This diagnosis transforms women into cancer patients, a transformation that would not have taken place in the absence of screening. The second and major harm of overdiagnosis is overtreatment. Although some tumors will not become life-threatening during the life span of the woman, it is currently not possible to distinguish dangerous from nonlife-threatening cancers. As a consequence, women with an overdiagnosed cancer receive unnecessary treatment, referred to as overtreatment. To prevent unnecessary treatment, a few trials have begun to compare usual care with active surveillance, a “wait-and-see”-procedure, for cancers that have a high risk of being overdiagnosed, that is, low-grade ductal carcinoma in situ (DCIS). In a “wait-and-see”-procedure cancers are only treated if and when they progress.
The extent of overdiagnosis in breast cancer screening remains highly uncertain, with estimates ranging from 0% to 54%. A major reason for this disagreement is the difficulty in estimating overdiagnosis. Overdiagnosis in cancer screening can be estimated with a variety of study designs, including randomized trials, pathological or imaging studies, modeling studies, and observational studies. In breast cancer screening, the most common designs are randomized trials, observational studies, and modeling studies. However, each design has limitations. In randomized trials and observational studies, overdiagnosis is estimated by comparing breast cancer incidence in the presence and absence of screening. Breast cancer screening works by identifying cancers at an earlier, more treatable stage. As a result, after the initiation of screening there will be a transient increase in breast cancer incidence. In the absence of overdiagnosis, this increased incidence will be compensated for by a subsequent decrease in cancer diagnoses. To determine if the cancers diagnosed during screening were attributable to early detection or overdiagnosis, women should be followed after leaving screening to account for this effect. Ideally, this follow-up after leaving screening should be until death. In randomized trials, there may be incomplete adjustment for early detection, although women were followed for 15 years after leaving screening in the Canadian trials and in the Malmö trial. This leads to overestimation of the extent of overdiagnosis. In observational studies, longer follow-up periods are possible, but there is no comparable population that is not offered screening. As a result, breast cancer incidence in the absence of screening is estimated using extrapolation of prescreening trends, control regions, nonattenders, or adjustment for the effect of screening. Because unscreened populations may differ from screened populations in characteristics that are also related to breast cancer incidence, all observational studies of overdiagnosis have the potential for bias. The primary limitation of using modeling studies to estimate overdiagnosis is the heavy dependence of overdiagnosis estimates from these studies on modeling assumptions such as lead time.
False-Positives
False-positive test results are the most common harm of screening mammography. Conceptually, a false-positive is defined as a positive screening mammography result in a woman who is cancer free. In the United States and other settings where screening mammography interpretation is performed according to the American College of Radiology Breast Imaging Reporting And Data Systems (BI-RADS) Atlas, a positive examination has been operationalized as a screening mammography initial assessment of 0:Needs Additional Evaluation, 3:Probably Benign, 4:Suspicious, or 5:Highly Suggestive of Malignancy. In organized screening programs such as those in Europe and Australia, a positive screening result is defined by recall for further evaluation. Typically, recalled women who are determined to be cancer free at the end of diagnostic evaluation and for 1 year after the recall are considered to have experienced a false-positive. False-positive results can be subdivided into those receiving further evaluation with imaging only and those that undergo invasive procedures including biopsy or fine needle aspiration.
False-positive test results have a number of negative consequences. The greater the false-positive risk, the lower the efficiency of the screening program and the more unnecessary imaging is performed. This adds to the overall resource use and cost of screening. False-positives also have negative psychological consequences for the affected women. Studies have found that women receiving false-positive test results experience increased anxiety and psychological distress. This anxiety and distress is greater in women who undergo invasive procedures rather than additional imaging only. However, a recent study found that the associated anxiety resolved quickly once the women were determined to be cancer free. The experience of receiving false-positive test results can also be a deterrent to participation in future screening. This has been found to be the case in several of the organized screening programs in Europe and Canada. Conversely, in the United States women receiving false-positive test results have been found to be more likely to return for future screening. Finally, false-positive results lead to additional radiation exposure through subsequent mammography and mammography-guided biopsy. Although radiation exposure due to mammography is small, the aggregate burden among women experiencing repeated false-positives could become large and should be minimized to avoid the increased risk of radiation-induced cancer.
False-positive screening mammography results are common, occurring in approximately 10% of exams in the United States and 1–7% of mammograms in European service screening programs. False-positives are generally more common in younger women and those with dense breasts. Because false-positive mammography results are common, the proportion of women participating in regular screening who receive a false-positive result over the course of their screening participation is large. Most evaluations of the balance of harms and benefits have quantified this harm in terms of the cumulative false-positive risk of screening mammography, defined as the probability that a woman will receive at least one false-positive mammography result over the course of a fixed number of screening mammograms, typically either 10 or the total number recommended by the screening program.
Other Harms
There are a variety of screening mammography harms that are not typically explicitly included when evaluating the balance of benefits and harms. When considering the harms of screening, it is important to consider not only direct harms but harms that result indirectly from downstream effects of the screening process, such as unnecessary treatment resulting from incidental findings. However, evidence reviews have not typically included indirect harms.
Additional direct harms not typically considered by evidence reviews, can be divided into harms that are serious but very rare and those that are common but have a generally minor impact. A false-negative screening mammography result is one harm that is very serious but relatively rare. A false-negative occurs when a woman is diagnosed with cancer following a negative screening mammography assessment (in the BI-RADS lexicon, an assessment of 1:Negative or 2:Benign). A negative mammography result could give a woman false reassurance that she is cancer free and may lead to delays in seeking care for new symptoms. Some evaluations of screening mammography include measures of the diagnostic accuracy of screening mammography interpretation, which typically provide an assessment of false-negatives at a single screening round. However, these are not commonly included in evaluations of the balance of harms and benefits. Radiation-induced cancer is another harm that is very serious but considered extremely rare. Radiation-induced cancer may be more of a concern in women with very large breasts or breast augmentation who require extra views at each exam for full coverage of breast tissue. Additional minor harms of screening mammography include the pain of the examination itself. A systematic review found that 28–77% of women report experiencing pain associated with mammography. Pain due to mammography has been found to be associated with discontinuation of screening.
Reviewing the Balance of Harms and Benefits
We have selected a number of influential reviews from North America and Europe to serve as examples for the comparison of reviews of benefits and harms of breast cancer screening. In this section, we will describe the context of these reviews, the general approach adopted by the authors, the sources of data used and the main outcomes of the reviews, both in relative and absolute measures. Tables 3.1–3.3 summarize the data from these reviews for breast cancer mortality, overdiagnosis, and the cumulative risk of false-positives.
Study Designs Selected; Included Studies | Intervention (Study Period; Age Groups; Screening Test; Screening Interval) | Relative Effect | Absolute Effect | |
---|---|---|---|---|
Cochrane | Randomized trials
|
| Four trials with adequate randomization did not show a statistically significant reduction in breast cancer mortality at 13 years (relative risk (RR) 0.90, 95% confidence interval (CI) 0.79–1.02); five trials with suboptimal randomization showed a significant reduction in breast cancer mortality with an RR of 0.75 (95% CI 0.67–0.83). The RR for all nine trials combined was 0.81 (95% CI 0.74–0.87) after 13 years. Malmo II trial excluded from 13-year follow-up analysis | Assuming a 15% mortality reduction: For every 2000 women invited for screening throughout 10 years, one will avoid dying of breast cancer |
Independent UK Panel | Randomized trials. 10/11 eligible trials included. Edinburgh trial excluded | Trials conducted between 1963 and 1997; 648,931 women in the age range 38–74 years; mammography with/without physical examination and/or self-examination; interval 12–33 months | Metaanalysis of these trials with 13 years of follow-up estimated a 20% reduction in breast cancer mortality in women invited for screening (RR =0.80, 95% CI 0.73–0.89) |
|
USPSTF | Randomized trials. Selection depends on age:
|
| In metaanalysis, RR for breast cancer mortality in women age 39–49 based on nine trials was 0.88 (95% CI 0.73–1.003), for women 50–59 based on seven trials was 0.86 (95% CI 0.68–0.97), based on five trials for women 60–69 was 0.67 (95% CI 0.54–0.83), and for women 70–74 based on three trials was 0.80 (95% CI 0.51–1.28) |
|
American Cancer Society | Metaanalysis of 9 randomized trials (13-year follow-up). Metaanalysis of 14 observational studies: seven incidence-based mortality studies and seven case–control studies. One modeling study providing seven estimates |
|
|
|
Canadian Taskforce | Randomized trials. 10/11 eligible trials included. Selection depends on age:
|
| Pooled estimate of breast cancer reduction from eight trials in women 40–49 of 0.85 (95% CI 0.75–0.96), seven trials of women 50–69 of 0.79 (95% CI 0.68–0.90), and two trials of women 70–74 of 0.68 (95% CI 0.45–1.01). Results are reported after a median follow-up of 11.4 years |
|
EUROSCREEN | Systematic search of PubMed up to February 2011; European observational studies, ie, trend studies ( n = 17), incidence-based mortality (IBM) studies ( n = 20) and case–control studies ( n = 8) | Studies reporting on programs implemented between 1970 and 2007; including at least some of the age groups between 50–69; mammography; interval 2–3 years; population-based screening programs (study has at least a three years’ overlap with the current regional or national program) | Pooled estimates of breast cancer mortality reduction among invited women were 0.75 (95% CI 0.69–0.81) in incidence-based mortality studies and 0.69 (95% CI 0.57–0.83) in case–control studies. Estimates for women actually screened were 0.62 (95% CI 0.56–0.69) in incidence-based mortality studies and 0.52 (95% CI 0.42–0.65) in case–control studies, corrected for self-selection | Assuming a 25–38% mortality reduction: For every 1000 women screened from age 50 to 69, seven to nine breast cancer deaths are prevented |
Study Designs Selected; Included Studies | Intervention (Study Period; Age Groups; Screening Test; Screening Interval; Screening Organization) | Relative Effect | Absolute Effect | |
---|---|---|---|---|
Cochrane | Randomized trials that did not invited the control group at the end of the screening phase (3/11: Malmö I, Canada I and II) and recent observational studies mentioned in discussion | Trials started screening between 1976 and 1980; 132,214 women in age range 40–69; mammography with/without physical examination and/or self-examination; interval 12–24 months | There were 30% more cancers in the screened groups than in the control groups. Large observational studies support these findings | Assuming 30% overdiagnosis: For every 2000 women invited for screening throughout 10 years, 10 healthy women who would not have had a breast cancer diagnosis if there had not been screening will be diagnosed as cancer patients, and will be treated unnecessarily |
Independent UK Panel | Randomized trials that did not invited the control group at the end of the screening phase (3/11: Malmö I, Canada I and II) | Trials started screening between 1976 and 1980; 132,214 women in age range 40–69; mammography with/without physical examination and/or self-examination; interval 12–24 months | The frequency of overdiagnosis was of the order of 11% from a population perspective, and about 19% from the perspective of a woman invited to screening | Assuming 19% overdiagnosis from an individual women perspective: For every 10,000 UK women invited to screening from age 50 for 20 years, 129 cancers will be overdiagnosed |
USPSTF | A review of eight trials, a metaanalysis of three trials, a systematic review of 13 individual studies, and 25 primary studies estimating overdiagnosis | Details on intervention factors can be found in the metaanalysis, systematic review of randomized trials and observational studies and the 25 primary studies | The relative overdiagnosis estimate was based on the metaanalysis of three trials. The rate of overdiagnosis was estimated at 19% | No absolute estimate provided |
American Cancer Society | Review of observational studies, modeling studies, and trials that did not invite the control group at the end of screening | Details on intervention factors are not reported in the review | The review notes that overdiagnosis estimates range from <5% to >50% with estimates based on modeling studies generally lower than those based on empirical studies | No absolute numbers provided. They conclude that there is good evidence that overdiagnosis does occur but no high-quality evidence on the magnitude of overdiagnosis |
Canadian Taskforce | The USPSTF review, a systematic review and four primary studies estimating overdiagnosis | Details on intervention factors can be found in the included systematic review and the four primary studies | The frequency of overdiagnosis ranges from 0.4% to 52% s in the included studies. In the main report of the review, the frequency of overdiagnosis ranges from 30% to 52% | For every 1000 women aged 39 years and older who are screened using mammography, five will have an unnecessary lumpectomy or mastectomy as a result of overdiagnosis |
EUROSCREEN | Literature review of observational studies that provided estimates of breast cancer overdiagnosis in European population-based mammographic screening programs | Studies reporting on programs implemented between 1970 and 2007; there were 13 primary studies reporting 16 estimates of overdiagnosis in seven European countries (the Netherlands, Italy, Norway, Sweden, Denmark, United Kingdom, and Spain) | Unadjusted estimates ranged from 0% to 54%. Reported estimates adjusted for breast cancer risk and lead time were 2.8% in the Netherlands, 4.6% and 1.0% in Italy, 7.0% in Denmark and 10% and 3.3% in England and Wales. The average estimate of the individual estimates was 6.5% of the incidence in the absence of screening | Assuming 6.5% overdiagnosis: For every 1000 women screened biennially from ages 50 to 51 years until ages 68–69 years and followed up until age 79 years, four cases are overdiagnosed |
Study Designs Selected; Included Studies | Intervention (Study Period; Age Groups; Screening Test; Screening Interval; Screening Organization) | Relative Effect | Absolute Effect | |
---|---|---|---|---|
Cochrane | Observational studies mentioned in the discussion | Details on the intervention are not reported in the review and can be found in the included studies | The cumulative risk of a false-positive result after 10 mammograms ranges from about 20% to 60% | For every 2000 women invited for screening throughout 10 years, it is likely that more than 200 women will experience important psychological distress for many months because of false-positive findings |
Independent UK Panel | No quantitative assessment of false-positive risk | |||
USPSTF | Observational studies from the United States and unpublished data from the BCSC |
| The observational studies reported a 10-year cumulative risk for false-positive mammography results of 61% for annual and 41% for biennial screening | The BCSC provided the absolute number of false-positives per 1000 women screened per age category:
|
American Cancer Society | Observational studies from the United States | Details on intervention factors are not reported in the review and can be found in the included studies | The observational studies reported a 10-year cumulative risk for false-positive mammography results of 61% for annual and 41% for biennial screening and for false-positive results leading to a biopsy recommendation of 7% for annual and 5% for biennial screening | No absolute numbers provided |
Canadian Taskforce | The USPSTF review and one additional primary study | Details on intervention factors are not reported in the review and can be found in the included studies | Data from the BCSC, as reported in the USPSTF review, gave a cumulative false-positive risk of 49%-77% after 10 screening rounds The observational studies on 49.1% and 20.8% | The absolute number of false-positive results per 1000 women screened for a median of 11 years was reported per age group:
|
EUROSCREEN | Systematic review of studies of the cumulative risk of a false-positive result in European screening program. Four studies were included | Studies published between 1955–2001 were incorporated; 390,000 women starting at ages 50–51 and continuing to ages 68–69; mammography; interval of 2 years; population-based screening program in a European country | Pooled estimates were derived from studies that estimated the risk over 10 years (364,991 women). The estimated cumulative risk of a false-positive screening result in women aged 50–69 undergoing 10 biennial screening tests varied from 8% to 21% in the three studies examined (pooled weighted estimate 19.7%). The cumulative risk of an invasive procedure with benign outcome ranged from 1.8% to 6.3% (pooled weighted estimate 2.9%) | Assuming a 20% false-positive recall and 3% false-positive recall with invasive work-up: For every 1000 women screened biennially from ages 50 to 51 years until ages 68–69 years and followed up until age 79 years, 170 women have at least one recall followed by noninvasive assessment with a negative result, and 30 women have at least one recall followed by invasive procedures yielding a negative result |
North America
In North America, the landscape of guidelines for screening mammography is characterized by recommendations from numerous groups. A variety of independent panels, professional societies, and advocacy groups issue screening recommendations. Several of these conduct an evaluation of the balance of harms and benefits in order to support their recommendations, while others rely on existing evaluations. Canadian provinces offer a defined screening mammography program to women age 50–69 but practices for younger and older women vary across provinces. No defined screening program exists in the United States, where it is left to individual providers and patients to make decisions that are informed by recommendations. Guidelines issued by the US Preventive Services Task Force (USPSTF) are particularly influential because they inform Medicare coverage decisions and many private insurers follow the same coverage practices as the Center for Medicare and Medicaid Services. Below we summarize the evidence review conducted by the USPSTF as well as that of the Canadian Task Force on Preventive Health Care (Canadian Task Force), a similar independent panel for Canada. Finally, we discuss the American Cancer Society (ACS) guideline statement as an example of an influential North American organization issuing recommendations related to screening mammography.
US Preventive Services Task Force
The USPSTF is an independent panel authorized by the US Congress and supported by the Agency for Healthcare Research and Quality (AHRQ) to make evidence-based recommendations about clinical preventive services. The USPSTF commissioned a review of the evidence on benefits and harms of screening mammography in preparation for an update to their recommendations on screening mammography issued in 2015. The review included evidence on harms and benefits of mammography for all women age 40 years and older. Evidence on harms and benefits of other screening modalities including breast MRI and digital breast tomosynthesis was also reviewed. The USPSTF in conjunction with AHRQ developed the key questions used to structure the review. The review itself was then conducted by an independent contractor sponsored by AHRQ, the Pacific Northwest Evidence-based Practice Center.
The primary benefit of screening mammography as defined by the draft USPSTF evidence review was reduction in breast cancer mortality. Other benefits considered included reductions in all-cause mortality, advanced breast cancer cases, and treatment-related morbidity. Harms were radiation exposure, pain during procedures, patient anxiety and other psychological responses, false-positive and false-negative test results, overdiagnosis, and overtreatment. Evidence on breast cancer mortality was obtained from randomized controlled trials of screening mammography in women age 40 and over. The evidence review identified eight eligible studies by searching the Cochrane Register of Controlled Trials, Cochrane Database of Systematic Reviews, and MEDLINE, as well as by manually reviewing references. Observational studies and systematic reviews were also included, although quantitative evaluation of breast cancer benefits was based on RCTs. A variety of evidence sources were used for evaluating harms. Systematic reviews and metaanalyses were included as well as recently published primary studies. Primary analysis of observational data on screening mammography from the Breast Cancer Surveillance Consortium (BCSC) was conducted to provide information on performance characteristics of screening mammography. Simulation modeling from the Cancer Intervention Surveillance Network (CISNET) as well as a new simulation model for radiation exposure were also incorporated.
A metaanalysis of the 8 trials included in the draft USPSTF evidence review estimated a relative risk (RR) of breast cancer mortality of 0.88 (95% CI: 0.73–1.00) for women age 39–49 years. Similar estimates were obtained for women 50–59 and 60–69. For women over age 70 three trials met inclusion criteria, but results of the metaanalysis in this age group had broad confidence intervals indicating substantial uncertainty in the benefit (RR =0.80, 95% CI: 0.51–1.28). The evidence review summarized the absolute benefit corresponding to these RRs in terms of breast cancer deaths prevented by screening for 10 years per 10,000 women screened. The number of breast cancer deaths prevented was estimated at 4.1 (95% CI: –0.1–9.3) for women aged 39–49, 7.7 (95% CI: 1.6–7.2) for women aged 50–59, 21.3 (95% CI: 10.7–31.7), and 12.5 (95% CI: –17.2–32.1) for women aged 70–74. The number needed to invite (NNI) the number of women who must be invited to participate in screening mammography for 10 years in order to prevent one breast cancer death. For women age 40–49 and 50–59 the NNI was estimated at approximately 2000 women. The draft USPSTF evidence review also included a summary of breast cancer mortality reduction based on observational studies using the results of the EUROSCREEN review (see section: Europe). However, estimates based on observational studies were not incorporated into numerical summaries of breast cancer mortality reduction due to the risk of bias inherent in observational studies.
The evidence review for overdiagnosis found that estimates varied substantially across studies and methodologies. Included studies consisted of a metaanalysis of five trials, a systematic review of observational studies, and 17 individual studies. A metaanalysis of three trials considered to be least biased, estimated overdiagnosis to be 19% (95% CI: 15–23). In observational and modeling studies using varied methodologies, overdiagnosis estimates ranged from <1% to 54%. The risk of false-positive mammography at a single screening round was estimated using data from the BCSC. Estimates ranged from 65 to 121 per 1000 examinations across age groups. Two observational studies of cumulative false-positive risk from the United States were identified that met inclusion criteria. Overall, cumulative false-positive risk after 10 years of annual mammography was estimated at 61%. Risk was higher among women 40–49 with heterogeneously dense breasts (69%) or extremely dense breasts (66%).
Overall, the evidence review found a significant benefit of screening mammography coupled with relatively frequent harms, with more modest benefits and more common harms in women under 50 years of age. On the basis of this review, the USPSTF issued new draft guidelines in 2015 supporting biennial mammography for women age 50–74 years. Routine screening for women younger than 50 was not recommended.
Canadian Task Force on preventive health care
The Canadian Task Force is an independent panel that makes recommendations on preventive clinical services, similar to the USPSTF in the United States. They commissioned the Evidence Review and Synthesis Centre to undertake a review of screening mammography to support updated recommendations in 2011. Because the USPSTF had conducted an evidence review using similar methodology in support of their 2009 breast cancer screening guidelines, the Canadian Task Force used the USPSTF evidence review and updated this review with studies published in the intervening period. The USPSTF review was used for evidence up to 2008 and updated with additional data through October 2011. The Canadian Task Force review identified breast cancer mortality as the benefit of screening mammography and expressed their results in terms of the number needed to screen (NNS) defined as the number of women who would need to be screened once every 2 years over about 11 years to prevent one breast cancer death. Results from nine trials were included in estimates of the benefit of screening mammography. The estimated NNS for women 40–49 was 2108, while for women 50–69 it was only 721. On the basis of a review of four primary studies and one prior systematic review, the Canadian Task Force estimated the overdiagnosis rate at 5 per 1000 women screened. Unlike the USPSTF review, evidence reviewed on false-positive risk did not incorporate primary data and was expressed only in terms of the number of false-positives associated with one breast cancer death prevented, not as a cumulative false-positive risk. The false-positive risk was found to be highest in the youngest age group at 690 false-positives per breast cancer death averted compared to only 204 in the 50–69 year age group.
The conclusions of the Canadian Task Force echo those of the USPSTF. The benefit of screening mammography was found to be smaller in younger women and harms were found to be more common. Screening every 2–3 years for women aged 50–74 was recommended. Routine screening was not recommended for women under 50 years of age.
American Cancer Society
In 2015, the ACS updated their breast cancer screening guidelines based on an independent systematic review conducted by the Duke University Evidence Synthesis Group, new analyses of observational data conducted by the BCSC, and data on burden of disease provided by the ACS Surveillance and Health Services Research Program. The benefit of screening mammography for the ACS review was defined as reduction in breast cancer mortality and increase in life expectancy and quality-adjusted life expectancy. Evidence on breast cancer mortality from RCTs, recent observational studies including at least 1000 women, and simulation modeling studies were included. New metaanalyses were not conducted, but pooled estimates of the RR of breast cancer mortality from metaanalyses conducted by the Canadian Task Force, UK Independent Panel, and Cochrane review were considered (see section: Europe). Absolute mortality benefit was expressed as the NNS biennially for 15 years to prevent one breast cancer death. Assuming a 20 or 40% mortality reduction, the NNS was 1770 or 753 for women 40–49, 1087 or 462 for women 50–59, and 835 or 355 for women 60–69. The ACS guideline group concluded that there was evidence that screening mammography leads to increases in life expectancy and quality-adjusted life expectancy but concluded that uncertainty about key parameters related to life expectancy precluded quantification of this benefit.
Harms included in the ACS evidence review were false-positive results, overdiagnosis, and overtreatment. Evidence on false-positive results was derived from two observational studies using data from the BCSC. Both observational and modeling studies of overdiagnosis were reviewed, but the ACS concluded that all existing studies relied upon unverifiable assumptions or were subject to potential biases. They therefore concluded that the risk of overdiagnosis could not be quantified using the available evidence.
On the basis of this evidence, the ACS recommended annual mammography for women aged 45–54 years and biennial mammography for women 55 years and older, as long as they are healthy and have a life expectancy of at least 10 years. Because the absolute risk of breast cancer is similar among women 45–49 and those aged 50–54, the ACS concluded that benefits outweigh harms beginning at age 45 years. Among women under 45, the harms were judged to likely outweigh the benefits.
Europe
Early evidence on the efficacy of mammography screening became available with the first publications of the randomized controlled trials in the early 1980s. Thus, when the Europe Against Cancer (EAC) program was initiated in 1986 in an effort to control cancer in Europe, its Committee of Cancer Experts decided that secondary prevention should be included. Breast, cervical, and colorectal cancers were considered. For breast cancer, a European Network was created in 1989, cofunded by the European Community, to implement mammography screening pilot programs in a number of member states. The target population in the pilots was women aged 50–64, but variation was allowed in the starting and stopping ages. It was expected that these pilots would provide a practical basis for the implementation of nationwide breast cancer screening programs. At the time of implementation of the pilots, population-based screening programs were already established in a limited number of countries, including the United Kingdom, the Netherlands, Finland, and Sweden.
The positive results of the EAC program prompted the European Council to publish a recommendation on cancer screening in 2003. In this recommendation, member states were invited to take common action to implement national cancer screening programs with a population-based approach, according to European quality assurance guidelines where they existed. Following this recommendation, numerous additional programs were established in Western Europe in the 1990s and, more recently, also in Eastern Europe. The first report on cancer screening in the European Union noted that in 2007 breast cancer screening programs were running or being established in at least 26 of the 27 member states. The majority of the programs have a 2-year screening interval and invite at least women in the age group 50–69, as specified in the Council Recommendation. However, despite this broad consensus, the way screening programs are implemented still varies across the EU.
The debate on breast cancer screening in Europe, and elsewhere, was not new but reignited with the publication of the first Cochrane Review in 2001. Below we summarize the outcomes of the Cochrane Review, based on its last update in 2013. We further summarize the reviews performed by the UK Independent Panel and the EUROSCREEN group that were initiated in response to the continuing debate and were published in 2012.
Nordic Cochrane review
The Cochrane Collaboration is an international, independent, not-for-profit organization, funded by a variety of sources including governments, universities, hospital trusts, charities, and personal donations. The first Cochrane review on screening for breast cancer with mammography was performed following a request of the Danish National Board of Health in 1999 and published in 2001. Updates were published in 2006, 2009, 2011, and 2013. The last review in 2013 was declared stable as no new randomized controlled trials on mammography screening had been identified. No further updates are expected.
The authors of the Cochrane review aimed to study the effect of screening for breast cancer with mammography on mortality and morbidity. A broad search strategy was used in Pubmed to identify both randomized trials and observational studies, as the latter were considered to provide important new knowledge, for example, in relation to evidence on overdiagnosis and other harms of screening. References were manually searched and letters, abstracts, gray literature, and unpublished data retrieved where possible. The outcome measures of the review included mortality from breast cancer, mortality from cancer, all-cause mortality, use of surgical interventions, use of adjuvant therapy, and harms of mammography.
The Cochrane review identified eight eligible randomized clinical trials, but excluded one because it was not adequately randomized. Intention to treat analyses were performed for breast cancer mortality, even though the authors judged breast cancer mortality to be an unreliable outcome, biased in favor of screening, and recommended overall mortality as primary outcome measure. Trials using less reliable randomization methods were evaluated separately. The pooled RR for the three trials with adequate randomization was 0.90 (95% CI: 0.79–1.02) and did not show a significant effect of screening on breast cancer mortality. The four trials classified as having suboptimal randomization had a pooled RR of 0.75 (95% CI: 0.67–0.83) indicating a 25% reduction in breast cancer mortality. The overall estimate for all seven trials combined was 0.81 (95% CI: 0.74–0.87), consistent with a 19% reduction after 13 years. The authors further assessed the impact of screening on all-cause mortality as well as deaths ascribed to any cancer and found no reductions. To express the results in absolute numbers, the authors assumed a 15% reduction in breast cancer mortality after 13 years of follow-up which means that for every 2000 women invited for screening throughout 10 years, one will avoid dying of breast cancer. Data from the randomized trials were further assessed for surgical interventions and radiotherapy, with significantly more operations performed in the study groups and more women receiving radiotherapy. Little information was found on other adjuvant therapy and no comparative data was reported on psychological morbidity.
In contrast to outcomes relating to benefit, the harms of screening were summarized only in the discussion section. The review does not state explicitly how the randomized trials and/or observational studies referred to in this section were selected. The outcomes discussed include overdiagnosis and overtreatment, in particular the increase in mastectomies, false-positives, psychological distress and pain. Based on the references selected, the authors conclude that the level of overdiagnosis and overtreatment was about 30% in the trials that did not introduce early screening in the control group and somewhat larger in the trials with suboptimal randomization. These findings were supported in a number of observational studies with overdiagnosis estimates ranging from 18% to 60%. In terms of absolute numbers, the authors estimated that for every 2000 women invited for screening throughout 10 years, 10 women would be overdiagnosed and overtreated, under the assumption that the level of overdiagnosis is 30%. For false-positive results, the cumulative risk after 10 mammograms was found to range from about 20% to 60%, with the highest risk reported for the United States. No mention is made of the estimate used to calculate the absolute numbers, but the authors stated that it is likely that more than 200 women will experience false-positive findings and psychological distress for every 2000 women invited for screening throughout 10 years.
The authors of the Cochrane review concluded that the evidence reported in the review cast doubts on the effectiveness of breast cancer screening and that “the time has come to re-assess whether universal mammography screening should be recommended for any age group.” In their view, women should be made more aware of the harms associated with attending screening, since the benefits are small at best.
United Kingdom
The Independent UK Panel on Breast Cancer Screening (referred to below as the independent UK panel) issued a report in 2012 that was jointly commissioned by Cancer Research UK and the Department of Health (England). The group was convened to review the evidence for the benefits and harms of breast screening in the United Kingdom. Members of the independent UK panel had relevant expertise but had not previously published on breast screening in order to ensure an objective and independent assessment of the evidence. A patient advocate was an integral member of the independent UK panel. The evidence considered included randomized controlled trials of breast screening as well as observational studies, including prospective follow-up and case–control studies. The independent UK panel reviewed the literature, but did not perform a formal systematic review, and heard testimony from experts in the field, from both sides of the debate. The key outputs of the review were an estimate of the effect of breast screening on breast cancer mortality, as the main benefit, and an estimate of the risk of overdiagnosis, as the main harm. Besides these, the independent UK panel considered other relevant issues, including additional harms through invitation, screening, diagnosis, and treatment, as well as women’s perceptions and cost-effectiveness. Although the independent UK panel did not make a systematic appraisal of evidence in all these areas, it did provide comments on each of these issues.
The independent UK panel’s estimate of the quantitative effect of breast screening on breast cancer mortality is based on randomized trials of screening, while acknowledging that these trials are not perfect. The analysis focused on 10 of the 11 randomized trials (excluding Edinburgh) and the metaanalysis conducted in the Cochrane review, using 13 years of follow-up (published in 2011). A RR reduction in breast cancer mortality of 20% (95% CI: 11–27) was estimated for groups invited to screening. The absolute risk reduction was expressed as the number of women needed to be screened for 20 years to prevent one death from breast cancer and was estimated at 180 women. The independent UK panel estimated that for 10,000 women attending screening from age 50 for 20 years, 43 deaths would be prevented. The independent UK panel looked at observational studies as a possible guide to more contemporary estimates of the benefit of breast cancer screening. They noted that in general these studies showed beneficial effects in the same direction as those seen in the trials, but expressed concern about inadequate control for the potential noncomparability of screened and unscreened women (self-selection bias).
The independent UK panel based its estimate of the risk of overdiagnosis on three randomized trials that did not systematically screen the control group at the end of the screening period. Estimates were on the order of 11% (95% CI: 9–12) from a population perspective, and about 19% (95% CI: 15–23) from the perspective of a woman invited to screening. In absolute terms, the independent UK panel estimated that for every 10,000 UK women invited to screening from age 50 for 20 years, about 681 cancers will be found of which 129 will represent overdiagnosis. Information from observational studies was also considered, but the independent UK panel concluded that these studies could not give reliable estimates of the extent of overdiagnosis. For the specific case of diagnosing DCIS via a screening program, it was noted that diagnoses of DCIS do not solely represent overdiagnosis, but that these diagnoses undoubtedly contribute to the cases of overdiagnosis. The independent UK panel did not mention the cumulative risk of false-positive results among the other harms of breast screening, but noted that a false-positive result can cause psychological distress.
The independent UK panel concluded that breast screening reduces breast cancer mortality, but that some overdiagnosis occurs. Based on the estimates for the United Kingdom, one breast cancer death is prevented for every three overdiagnosed cases identified and treated. Evidence from a focus group further showed that many women feel that accepting the offer of breast screening is worthwhile.
EUROSCREEN
The effort of the EUROSCREEN group to summarize the accumulated evidence on the impact of population-based breast cancer screening in Europe was launched in the context of the European Screening Network in 2010. EUROSCREEN is a self-initiated cooperative group that includes scientists and professionals experienced in planning and evaluating most of the population-based screening programs running in Europe. In contrast to the other reviews presented in this chapter, EUROSCREEN focused on observational studies to systematically assess the impact of established service screening programs in Europe. The aim was to develop the best current estimate of the impact of service screening on breast cancer mortality, as well as screening harms, in particular the risk of overdiagnosis and false-positive screening results. Outcomes of EUROSCREEN were published in eight peer-reviewed papers, including a summary of main findings in the form of a balance sheet. The series of literature reviews further addressed the methodological standards of evaluation and the importance of using appropriate statistical methodology to design and analyze observational studies of mammography.
EUROSCREEN defined the reduction in breast cancer mortality for women invited versus not invited and/or women screened versus not screened as the primary benefit, and overdiagnosis of breast cancer and false-positive screening tests as the most important harms. A systematic search of Pubmed identified observational studies reporting on breast cancer mortality in relation to service screening programs in Europe implemented between 1970 and 2007. In line with European policy recommendations, eligible studies had to include at least part of the age group between 50 and 69. Since the aim of the review was to report on estimates from ongoing population-based screening programs, the studies had to have at least a three years’ overlap with the current regional or national screening program. Based on these criteria, 83 studies were selected and grouped according to study design, that is, trend studies, incidence-based mortality studies, and case–control studies. The trend studies that aimed to quantify the impact on breast cancer mortality were too different to produce a pooled estimate of the effect of screening. Pooled estimates of breast cancer mortality reduction based on incidence-based mortality studies and case–control studies were on the order of 25–31% for women invited and 38–48% for women actually screened. The search strategy for overdiagnosis identified 13 studies that provided 16 explicit estimates from European population-based screening programs. Unadjusted estimates varied widely with estimates in the range of 0–54%. Studies were further classified according to the adjustment for breast cancer risk and lead time bias. Adequately adjusted studies report estimates of overdiagnosis in the range of 1–10%. Literature review and manual search of references for studies of the cumulative risk of a false-positive result in European service programs identified three unique studies. The pooled estimate from these studies indicated a cumulative risk of 20% for women aged 50–69 undergoing 10 biennial screening tests. The cumulative risk of a false-positive result in women undergoing needle biopsy was reported in two studies with a pooled estimate of 3%.
The evidence from the literature reviews was further summarized in the form of a balance sheet. For every 1000 women screened biennially from ages 50 to 51 years until ages 68–69 years and followed up until age 79 years, an estimated seven to nine breast cancer deaths are avoided, four cases are overdiagnosed, 170 women have at least one recall followed by noninvasive assessment with negative results, and 30 women have at least one recall followed by invasive procedures yielding a negative result. In conclusion, the pooled experience in European countries showed that the chance of saving a woman’s life is greater than that of overdiagnosis. EUROSCREEN recommended continuing the population-based screening programs currently ongoing in Europe.