Plain Language Summary 323
Technical Background 324
Evidence on Screening From Different Technical Eras 328
FFMD Versus SFM 328
Digital Breast Tomosynthesis 330
What Evidence Is Needed Before We Change a Screening Modality? 341
Advanced Imaging Modalities 342
List of Acronyms and Abbreviations 342
Plain Language Summary
Mammography screening has been adopted in many countries and population health programs because randomized trials showed that it reduces the risk of dying from breast cancer. These randomized trials used screen-film mammography (SFM) which means that the X-ray beams are captured on a film cassette. The films are then developed and reviewed on a light-box by the physician. However, technical developments have witnessed an evolution from screen-film to digital mammography (DM) which means that a specially designed digital so-called “detector” captures the X-rays (replacing the film) and converts the information to a digital image, which is displayed on a high-resolution computer monitor, and transmitted and stored just like computer files. The evidence prompting a change to DM screening came from studies comparing detection for SFM and DM (one of these was a randomized trial) that showed roughly similar or higher cancer detection using DM compared to SFM. There were also some technical efficiencies underlying a general shift to digital imaging in medical practice. In very recent years, a new technology known as digital breast tomosynthesis (DBT, or tomosynthesis), essentially a quasi-three-dimensional mammogram, has become available. So far, tomosynthesis seems capable of detecting more breast cancers then standard DM and it could potentially reduce false-alarms (known as false recalls or false-positive screen) but different studies show mixed findings on this issue. Although some countries have started to use tomosynthesis for screening, and several trials have been or are being done to address evidence gaps around this new technology, there are no recommendations to change to tomosynthesis screening because it is not yet known whether using tomosynthesis adds health benefit above what might be achieved with standard mammography screening.
Since the 1980s mammography screening has been implemented in many countries, including population-based programs, following the results of the randomized controlled trials (RCTs) of mammography screening, as described in chapter “Estimates of Screening Benefit: The Randomized Trials of Breast Cancer Screening.” Alongside widespread adoption of mammography screening, there has been a continuous and substantial technical development from SFM to full-field digital mammography (FFDM) and very recently also the introduction of digital breast tomosynthesis (DBT). This technical evolution calls for new evidence regarding the performance of screening using new mammography technologies, and the evidence needed to translate new technologies into screening practice.
This chapter has three main sections, beginning with a brief overview of the technical background and development of the different mammography-based modalities. The second part focuses on the trials and studies that form the basis of the evidence for screening with FFDM versus SFM, and the rapidly emerging evidence on the introduction of DBT in breast screening. The third part pinpoints the evidence needed if and when breast tomosynthesis is adopted routinely into service screening for breast cancer.
Mammography is an X-ray examination of the breast used for decades in diagnosis of breast disease and as a screening modality. In analog mammography, also called SFM, the X-ray beams are captured on a film cassette. The films are then developed and reviewed on a light-box by the physician. Breast cancer is not one single disease entity; it rather comprises a wide range of types, growth rates, and growth patterns which is also mirrored in the X-ray image of the breast. Hence, the radiographic appearance of breast cancer ranges from barely detectable, minimal signs to apparent cancer growth and hence a clearly visible abnormality. Some radiographic patterns of breast cancer are readily detected when relatively small (at an early stage), such as spiculated lesions and tumors presenting with calcifications, whereas other lesions are challenging to detect such as tumors causing only a nonspecific density or areas with subtle architectural distortion ( Fig. 13.1 ).
Therefore, mammography is one of the most technically challenging areas in radiography since it requires a high spatial resolution, for fine details like microcalcifications, and outstanding soft-tissue contrast to enable visualization of soft-tissue lesions such as tumors. Moreover, a low radiation dose is crucial since the breast is a radiosensitive organ. SFM was hence refined to get high-resolution images of both soft-tissue lesions and calcifications. The randomized trials in mammography screening were all performed with SFM systems and hard copy (film) reading. Still, far from all screening centers have transitioned to DM, due to costs inherent in changing workflow from analog to digital and due to image storage issues.
Full-field digital mammography
Along the development of digital imaging, flat panel detectors were developed enabling so-called FFDM, which has been broadly used in diagnostic and screening practice since 2006. In FFDM a specially designed digital detector captures the X-rays (replacing the film) and converts the information to a digital image, which is displayed on a high-resolution computer monitor, and transmitted and stored just like computer files. Before the introduction of DM into practice, it was questioned whether the image quality and especially the spatial resolution of DM was sufficient for detection of minimal calcifications. It was concluded, however, that even if the spatial resolution of DM commonly was inferior to SFM due to technical reasons, the resolution is good enough for breast cancer diagnostics. The digital technique provided a number of other advantages compared to SFM, such as elimination of film-related inefficiencies, contrast resolution, quick transfer/teleradiology, and simplified storage of images in picture archiving and communication systems and image processing. It also reduces the need for technical repeats, because the radiographer can immediately see whether the image fulfills the quality standards or not and the woman does not need to come back for a new visit due to technical failure. Furthermore, digital images allows for potential application of computer-aided detection and other advanced applications.
Digital Breast Tomosynthesis
The development of tomosynthesis
Tomosynthesis is a relatively new tomographic X-ray technique with the capability to significantly reduce the confounding effect of overlapping normal anatomy. The story of tomosynthesis development began in the beginning of the 20th century. It was not until the early 1970s that the idea was realized by a group at Johns Hopkins University, Baltimore, United States and further developed by the same group by inventing a simple reconstruction method. Since the applications considered for tomosynthesis were thereafter taken over by the introduction of spiral computered tomography in the late 1980s, the development of tomosynthesis came to a halt. Along with the digitalization of radiography, more powerful computers, and the development of high-quality flat panel detectors, tomosynthesis reemerged again for breast, lung, and skeletal applications. In breast imaging, the study of Niklason et al. started the new era for breast tomosynthesis by describing and evaluating a breast tomosynthesis imaging method based on a FFDM system.
In tomosynthesis, low-dose images are acquired of the compressed breast from different angles (usually 11–25 by number), as the X-ray tube moves along a limited arc, typically between 15 and 50 degrees (see Fig. 13.2 ).
The low-dose images (so-called projection images) are mathematically reconstructed to a stack of usually 1 mm thin slices, parallel to the detector plane. The slices can be viewed as a cine-loop or through scrolling/viewing single slices. Each slice contains much less of the overlapping tissue compared to a standard mammography image, and hence much of the superimposition effect of dense glandular tissue and parenchymal structures (potentially masking tumors) is reduced. The total examination time is about the same as for a mammography examination. The time that the breast needs to be compressed during image acquisition varies depending mainly on how wide the acquisition angle that is used. Scan times for one projection have been reported to range between 8 and 25 s. Any of the traditional mammography views can be used, such as craniocaudal, mediolateral-oblique, and lateral views, etc.
The projection images first sampled need to be reconstructed and processed to get good images for review and interpretation. Much research efforts have been put into optimization of image reconstruction. Initially, filtered back-projection was the most common reconstruction method and the same as used in CT, although in later years iterative reconstruction has been introduced. Iterative reconstruction may be less insensitive to noise and incomplete sampling data but usually takes longer time compared to filtered back-projection. No conclusion has been reached on which method is more superior.
Without any additional radiation exposure, it is possible to derive a so-called synthetic mammogram from the tomosynthesis volume. Some tomosynthesis systems are approved by the US Food and Drug Administration only for the clinical use of the combination of tomosynthesis and mammography, which entails almost double the radiation dose if obtained as dual acquisition. Furthermore, there was initially a belief that microcalcification detection would be inferior in tomosynthesis compared to mammography. There are conflicting results regarding the sensitivity of tomosynthesis for microcalcification, whereas some publications suggest tomosynthesis is inferior in calcification detection compared to mammography, other studies suggest similar capability for the two modalities, and one study suggesting that tomosynthesis may be more specific but slightly less sensitive than DM in characterizing microcalcification. Such differences between study findings on microcalcification may be partially due to the fact that many are observer studies and not actually embedded in routine screening reporting, therefore more realistic estimates for detection of microcalcification are available from the prospective trials (see paragraph on prospective trials of DBT). A synthetic mammogram may be of help or could save time judging whether there actually is a microcalcification cluster present, since individual calcifications from a cluster may appear in different adjacent tomosynthesis slices. In addition, a synthetic mammogram could provide an overview of potential lesions within the 3D volume and help when comparing tomosynthesis to prior mammograms. The combination of two-view tomosynthesis and synthetic (C-view) mammogram has been investigated in the Oslo Tomosynthesis Screening Trial (OTST) with promising initial results. In that study, using synthesized two-dimensional (2D) images plus DBT data during the interpretation of screening mammograms, the radiologists’ overall performance levels were comparable to when using standard FFDM plus DBT, with cancer detection rates of 7.8 and 7.7 per 1000 screens for FFDM plus DBT and for current synthesized 2D images plus DBT, respectively (false-positive scores, 4.6% and 4.5%, respectively). The radiation dose to the breast was reduced by 45% with the combination of synthesized 2D images and DBT. Availability of synthetic mammograms, if tomosynthesis is to be used in screening, will probably be pivotal to minimize the radiation dose for screening populations, by avoiding dual acquisition of DBT and DM. Evidence from additional studies addressing the issue of synthetic 2D in population screening would be valuable, and at least one prospective population-based trial (STORM 2: screening with tomosynthesis or mammography 2) is anticipated to report its findings in the near future.
Evidence on Screening From Different Technical Eras
FFMD Versus SFM
The implementation of mammography screening is based on the results from the former RCTs where SFM was used, see also chapter “Estimates of Screening Benefit: The Randomized Trials of Breast Cancer Screening” , Nelson et al., in this book which outlines the evidence from the mammography screening trials. These trials had long-term follow-up and breast cancer–specific mortality as endpoint. The majority of the studies comparing the screening performance of FFDM versus SFM are observational, comparative studies. The only RCT is the Oslo II trial. The main results of these studies and the Oslo trial are described in Table 13.1 . It is worth noting that despite higher cancer detection at FFDM in the Oslo trial, there was no significant reduction in interval cancer rates.
|Study (First Author)||Age Group (Years)||Number of Screening Examinations||Recall Rate (%)||Cancer Detection Rate (%) (Invasive and DCIS)||Positive Predictive Value for Recall (%)|
|Colorado–Massachusetts (Lewin et al., 2001 )||>40||6736||6736||14.9||11.8 b||0.49||0.40||3.3||3.4|
|Oslo I (Skaane et al., 2005 )||50–59||3683||3683||3.5||4.6 b||0.71||0.54||20.2||11.8|
|Oslo II (Skaane et al., 2007 )||45–69||16,985||6944||2.5||4.2 b||0.38||0.59 b||15.1||13.9|
|DMIST (Pisano et al., 2005 )||47–62||42,555||42,555||8.6||8.6||0.41||0.44||4.7||5.1|
|Helsingborg (Heddson et al., 2007 )||46–74||25,901||9841||1.4||1.0 b||0.31||0.49 b||21.8||47.1 b|
|Florence (Del Turco et al., 2007 )||50–69||14,395||14,385||3.5||4.3 b||0.58||0.72||14.7||15.9|
|Vestfold County (Vigeland et al., 2008 )||50–69||324,763||18,239||4.2||4.1||0.65||0.77||15.1||18.5 b|
|East London (Vinnicombe et al., 2009 )||≥50||31,720||8478||3.4||3.2||0.72||0.68||14.6||14.2|
|Barcelona (Sala et al., 2009 )||50–69||12,958||6074||5.5||4.2 b||0.40||0.40||7.5||9.7|
|Utrecht (Karssemeijer et al., 2009 )||50–75||311,082||56,518 c||1.3||2.1||0.51||0.56||39.5||25.5 b|
|INBSP d (Hambly et al., 2009 )||50–64||153,619||35,204||3.1||4.0 b||0.52||0.63 b||16.7||15.7|
|Flanders (Van Ongeval et al., 2010 )||50–69||23,325||11,355||2.4||2.6||0.64||0.59||24.8||24.0|
There are large variations in the results from the nonrandomized studies, which may have several explanations. The study designs vary and include prospective, paired testing using both techniques, or retrospective comparisons of cohorts screened with either FFDM or SFM and used different screen-reading and recall strategies. Regarding cancer detection, FFDM seems to perform equally or better than SFM, and in the RCT higher cancer detection was shown for FFDM. Considering the recall rates for FFDM and SFM, these show large variability across studies and in several of these studies recall was actually slightly increased by FFDM. It has been suggested that the higher recall rate at FFDM in most of the studies may be explained by the higher contrast resolution and increased cancer conspicuity for subtle “minimal sign” lesions and microcalcifications at FFDM, although there are no studies that have explicitly investigated this. Furthermore, differences in reading strategies and whether single- or double-reading was performed may explain the large variations in results regarding recall rates.
A systematic review with a metaanalysis of eight studies concludes that FFDM results in a slightly higher detection rate (11 additional cancers per 10,000 screening mammograms (95%CI 4 to 18)) for FFDM, particularly at age 60 years and younger, but no clear modality difference in recall rates or positive predictive values.
Some interesting findings were shown in subgroup analyses in some of the above-reported studies. The Digital Mammography Imaging Screening Trial (DMIST), showed overall equal accuracy for FFDM and SFM, although in women 50 years and younger as well as in women with heterogeneously or extremely dense breasts the accuracy was higher for FFDM. In the Florence study, which was performed within a population-based screening program, similar results were found in younger women.
Digital Breast Tomosynthesis
When DBT units became available either as prototype systems or commercially available systems, there was a great interest in evaluating the technique in clinical settings as well as extensive technical engagement for improving the systems. The majority of the initial studies on DBT were performed on enriched populations (ie, with a high proportion of cancers in relation to normal cases and mostly done using retrospective mammogram sets) as outlined in a review that captured the literature until 2012. In the overview from Houssami and Skaane, the following summary findings were highlighted (here updated with literature until 2015):
One-view DBT has at least equal or better accuracy than standard (two-view) DM.
Two-view DBT has at least equal or better accuracy than standard (two-view) DM based on comparative accuracy for these modalities.
The addition of DBT to standard two-view mammography (DBT with DM or film-screen mammography for diagnosis or triage of screen-recalled abnormalities) significantly improves accuracy—partly or predominantly through reduced false-positive interpretations.
Comparisons of DBT and DM using enriched reader studies yielded variable estimates for accuracy and/or sensitivity and specificity pairs, which may be due to reader differences or differences in the research methods used in these studies. Overall, improved accuracy from DBT (relative to, or added to, mammography), appears to be due to increased cancer detection or due to a reduction of false-positive recalls or both.
Many observer or reader studies did not find significant differences in interpretive accuracy between DBT and DM: this may be a true reflection of lack of a difference, or more likely a limitation of the “selection” of cases (from DM-defined detection) and also due to underpowered analyses since some studies had modest subject numbers.
Subjective interpretation of cancer conspicuity or lesion visibility (evaluated qualitatively in feature analysis or quantitatively scored) consistently showed that cancers were equally conspicuous or more conspicuous on DBT relative to FFDM.
Using the framework of the literature review from Houssami and Skaane, we updated the evidence summary on studies of DBT (other than the screening trials described in the paragraph “prospective screening trials with DBT”) comprising cancer-enriched studies: the updated findings were generally similar to those discussed in the published overview with more consistent evidence in observer studies that adding DBT improves interpretive accuracy. Furthermore, one-view DBT was shown to have higher diagnostic accuracy compared to two-view DM in an extended study by Svahn et al. We present our updated study-specific summary in Table 13.2 .
|Study||Study Design||Subjects Age (Timeframe of Cases Collected)||Summary of Findings on Accuracy|
|Gilbert et al., (TOMMY) (2015) a||Multicentre retrospective reader study of enriched mammogram set collected prospectively from women recalled for assessment of DM screen-detected finding in the UK national program or women having screen because of family history (7060 screens: 1160 BC) read by 26 radiologists||Mean 56 (range 29–85) (2011–13)||b AUC from ROC analysis:|
|b AUC from ROC analysis (data restricted to screens with breast density ≥50%):|
|Alakhras et al. (2015) c||Retrospective reader study of enriched mammogram set (50 cases: 27 BC) read by 26 radiologists||NR||b AUC from ROC analysis (all readers combined):|
|d Jacknife FROC analysis:|
|Morel et al. (2014) e||Retrospective reader study of enriched mammogram set from women recalled to assessment of DM screen-detected finding or symptomatic women having work-up imaging (341 cases, 354 lesions: 103 BC), 7 readers||Mean NR (range 35–73) years (2010–11)||b AUC from ROC analysis:|
|Thibault et al. (2013) f||Retrospective reader study of enriched mammogram set from women with image-detected (DM or ultrasound) findings or symptoms (130 cases, 55 BC), 7 readers||Mean NR (≥40) years (NR)||b AUC from ROC analysis showed no statistical difference for various modalities (ordered from highest to lowest AUC):|
|Rafferty et al. (2013) g (Rafferty, 2014) h||Retrospective reader studies of enriched mammogram sets collected prospectively from women having screening or diagnostic (pre-biopsy) mammography:||NR (2006–07)||b AUC from (pooled) ROC analysis:|
|Wallis et al. (2012) i||Test set (130 subjects: 40 BC, 24 benign lesions, 66 normal) observer study; read by two groups of 10 experienced readers |
*Prototype DBT system
|Mean 56.3 (range 40–80) with BIRADS density 2–4 (2008–09)||b AUC from ROC analysis:|
|Michell et al (2012) j||Prospective study of women recalled (738 consecutive recalls, both FFDM and DBT were performed: 204 BC) to assessment based on positive film-screen mammography at assessment, and double-read by experienced breast radiologists||NR (2009–10)||b AUC from ROC analysis:|
|Skaane et al. (2012) k||Clinical series of 129 subjects (27 BC) with symptoms or recalled for screen-detected abnormality or having surveillance; read by experienced breast radiologists (1 with limited experience). DBT reported 2–4 weeks after (and blinded to) standard assessment||Mean 57 (range 30–87) years (NR)||FFDM (with standard work-up) vs DBT (blinded to result of assessment): DBT concordant with assessment (identified all 25 BC): DBT-only recalled 4 other cases (2 of these had BC)→8% incremental detection with DBT|
|Svahn et al. (2012) l||Clinical series of 185 subjects, 89 with BC, observer study; read by 5 experienced breast radiologists; constrained cases selected with subtle screen-detected abnormalities or diagnostic (symptomatic) cases. (Extension of the study by Svahn 2010)||Mean 60 (range 42–79) years (2006–08)||Jacknife alternative FROC d analysis (mean figure of merit)|
|Gur et al. (2012) m||Test set (228 breasts/114 mammograms: 48 BC, 6 high-risk and 30 benign lesions, 144 normal) observer study; read by 10 tomosynthesis-trained radiologists. (Tested version of synthetically reconstructed 2D-images (2Dsynthetic))||Mean 51 (range 36–77) years (2008–09)||Average sensitivity |
|Average false-positive (FP) recall|
|Bernardi et al. (2012) n||Prospective (simulation) study integrating DBT to triage screening recalls (158 consecutively recalled screens: 21 BC); 7 radiologists provided opinion on DBT prior to assessment||Mean 51.8 (range 35–77) years (2011)||Screens recalled on FFDM vs whether would recall with added DBT|
|Noroozian et al. (2012) o||Test set (67 subjects with breast masses: 30 BC, 37 benign lesions) observer study; read by 4 radiologists experienced in breast imaging; reads separated in time and by random allocation||Mean NR (range 34–88) years (cases collected 2006–08)||b AUC from ROC analysis comparing digital spot compression vs DBT of breast containing lesion:|
|Svane et al. (2011) p||Clinical series ( n = 144: 76 BC) selected with suspicious lesions on mammogram; read by 2 breast radiologists with limited tomosynthesis training |
*Prototype DBT system
|Mean 56.8 (range 40–85) years; (2007–09)||b AUC from ROC:|
|Sensitivity: 2D mammography 93.4% vs one-view tomosynthesis 89.5% (no significant difference)|
|Specificity: 2D mammography 80.9% vs one-view tomosynthesis 86.0% (no significant difference)|
|Tagliafico et al. (2011) q||Prospective study of abnormalities recalled from screening for mammographic work-up (52 consecutive recalls: 9 BC); 2 experienced breast radiologists independently provided opinion on spot compression views and DBT||Mean 51 years (2010)||AUC comparing digital spot compression vs DBT of the breast containing FFDM-recalled abnormality:|
|Gennaro et al. (2010) r||Clinical/diagnostic series (200 subjects: 63 BC) selected with equivocal or suspicious lesions on mammogram and/or ultrasound; interpreted by 6 breast radiologists with experience in conventional mammography |
*Prototype DBT system
|Age (NR) (2007–08)||b AUC from ROC analysis|
|Mean sensitivity and specificity for FFDM vs one-view DBT did not significantly differ:|
|Svahn et al. (2010) s||Test set (50 subjects) observer study; read by 5 experienced breast radiologists; constrained cases selected with subtle screen-detected abnormalities or diagnostic (symptomatic) cases |
*Prototype DBT system
|NR (NR)||Jacknife alternative FROC d analysis (mean figure of merit)|
|Combined DM +DBT higher summary accuracy vs two-view DM ( p < 0.05); no difference found for other comparisons|
|Teertstra (2010) t||Clinical series of 513 subjects referred for work-up of abnormal screen or with symptoms, and followed-up for 2 years for incident cancer (112 incident BC); blinded reading of DBT and FFDM by one radiologist |
*Prototype DBT system
|Mean 52 (range 29–92) years (2006–07)||FFDM vs DBT: 8 of 112 BC missed by each method; sensitivity 93% for each of FFDM and DBT|
|Similar specificity: FFDM 86.1% vs DBT 84.4%|
|Gur et al. (2009) u (Gur et al., 2011) v||Test set (125 subjects: 35 BC, 90 no cancer) observer study, comparing 4 reading modalities ; read by 8 experienced radiologists. (Gur et al., 2011) was based on the same study-set but used different analytic methods)||NR (NR)||Overall (all readers) sensitivity and specificity pairs:|
|(Same study-set with FROC d analysis reported in 2011: combined FFDM +DBT improved accuracy index by an average 16% (95% CI: 7–26%; p < 0.01) over FFDM alone)|