Raw data
Time period being audited and the total number of examinations during that time
Number of screening and number of diagnostic examinations and separate audit for each of these two groups
Number of BI-RADS Category 0 assessment
Number of BI-RADS Category 4 and 5 assessment [MQSA mandated]
Biopsy results for fine needle, core biopsy, and open surgical biopsy
Cancer staging: size of the tumor, histological type, nodal status, and grading
All cases of known false-negative mammograms have to be analyzed and mammograms prior to the diagnosis of cancer should be reviewed [MQSA mandated]
Derived data
True positives
False positives
Positive predictive value [PPV1, PPV2, PPV3]
Cancer detection rate for screening examinations
Percentage of minimal cancers [DCIS or invasive cancers 1 cm or less]
Node-negative cancers
Abnormal interpretation rates
Box 10.2. Complete Mammography Audit
Additional data to be collected for a complete mammography audit |
Risk factors |
Patients’ age |
Breast cancer history: personal and family |
Hormone replacement therapy |
Previous biopsy proven atypia or lobular carcinoma in situ |
Baseline, routine follow-up or short interval follow-up examination |
Mammographic assessment |
BI-RADS Category 1, negative, and BI-RADS Category 2 benign findings |
Short interval follow-up: BI-RADS Category 3 |
Cancer data |
Mammographic findings: mass, calcifications, indirect signs of cancer, no mammographic signs of cancer |
Palpable or not |
Derived data to be calculated from the more complete mammographic audit |
True negatives, false negatives |
Sensitivity |
Specificity |
Cancer detection rate |
Prevalent vs. incident cancer detection rates for screening |
Cancer detection rate for diagnostic examinations |
Rates for various age groups |
Percentages of nonpalpable cancers calculated separately for screening and diagnostic examinations |
Percentage of minimal cancers separately for screening and diagnostic examinations |
Percentage of node-negative cancers separately for screening and diagnostic examinations |
Abnormal interpretation rate for diagnostic examinations |
Mammography Audit Definitions [4]
It is important to understand the definitions of the types of a breast imaging studies and the parameters that are used in a mammography audit. These are outlined next as they appear in the BI-RADSTM atlas [4]:
A screening examination is defined as an examination performed on asymptomatic woman to detect early, clinically unsuspected cancer. The screening group also includes special sub-groups namely women with augmented breast who need additional views optimized to assess breast and women with a personal history of breast cancer.
A diagnostic mammographic examination is performed when there are clinical signs and symptoms that suggest breast cancer, and on a woman with an abnormal screening examination.
A tissue diagnosis is a pathologic diagnosis rendered after any type of biopsy, percutaneous or open surgical with or without image guidance and or localization.
A positive screening examination includes one for which a recall is initiated or a tissue diagnosis is recommended. It is to be noted that the MQSA final rules includes only those that have been recommended for tissue diagnosis as being a positive screening examination.
A positive diagnostic examination is one that requires a tissue diagnosis
A negative screening examination is one that is negative or benign findings (BI-RADS Category 1 or 2)
A negative diagnostic examination includes, a negative, benign or probably benign assessment (BI-RADS Category 1, 2, 3)
Cancer diagnosis refers to Ductal carcinoma in situ or any type of primary invasive breast carcinoma, metastatic carcinoma is not included.
True positive (TP) is when there is a tissue diagnosis of cancer within one year of a positive examination. (BI-RADS Category 0, 4, or 5 for screening study and BI-RADS Category 4 or 5 for diagnostic study).
True negative (TN) is when there is no tissue diagnosis of cancer within one year of a negative examination (BI-RADS Category 1 or 2 for screening; BI-RADS Category 1, 2 or 3 for diagnostic).
False negative (FN) is when there is a tissue diagnosis of cancer within one year of a negative examination (BI-RADS Category 1 or 2 for screening; BI-RADS Category 1,2 or 3 for diagnostic).
False positive (FP) has three definitions:
FP 1: No known tissue diagnosis of cancer within one year of a positive screening examination
(BI-RADS Category 0, 4, or 5)
FP 2: No known tissue diagnosis of cancer within one year after recommendation for biopsy or surgical consultation resulting from a positive examination (BI-RADS Category 4, or 5)
FP3: A benign tissue diagnosis of cancer within one year after recommendation for biopsy or surgical consultation resulting from a positive examination (BI-RADS Category 4, or 5)
Positive Predictive Value (PPV)
PPV 1: The percentage of all positive screening examinations with a tissue diagnosis of cancer within one year (BI-RADS Category 0, 4, or 5). It is very unusual yet possible to assign a category 4 or 5 on an initial screening assessment.
PPV 2: The percentage of all positive screening or diagnostic examinations that were recommended for biopsy or surgical consultations and with a tissue diagnosis of cancer within one year (BI-RADS Category 4, or 5).
PPV 3: The percentage of all known biopsies done as a result of a positive screening or diagnostic examinations [BI-RADS 4 and 5] that resulted in a tissue diagnosis of cancer within one year.
Sensitivity is the probability of detecting cancer when a cancer exists or the number of cancers diagnosed after being identified at mammography in a population within one year of the imaging examination divided by all cancers present in the population in the same time period. Sensitivity = TP/TP + FN
Specificity: The probability of interpreting a mammogram as negative when cancer does not exist or the number of true negative mammograms in a population divided by all actual negative cases in the population. Specificity = TN/TN + FP
Cancer detection rate: The number of cancers correctly detected at Screening Mammography per 1,000 patients and if calculated for diagnostic mammography should be reported separate from Screening Mammography.
Abnormal Interpretation Rate: This is the rate of examinations that are positive, for screening examinations this will include BI-RADS Category 0, 4 and 5 assessments and BI-RADS 4 or 5 for diagnostic mammography. For the most part abnormal interpretation rate is the same as recall rate; the only rare exception is when a BI-RADS 4 or 5 assessments is given on a screening mammogram. Even in cases of obvious suspicious findings, additional imaging is generally needed to determine extent of disease and to plan type of image guidance for biopsy.
MQSA-Mandated Mammography Audit
MQSA requires that each facility designate a lead interpreting physician who is responsible for reviewing medical audit outcomes yearly. Results have to be analyzed and individual radiologists and the facility have to be notified. The audit data have to be maintained for at least 24 months and longer if required to do so by state regulatory bodies. A system should be in place to collect and review outcome data on all mammograms performed. Follow-up on all positive mammograms is required. A system needs to be in place to attempt obtaining pathology results on all mammograms with a recommendation for biopsy with correlation of biopsy results with the mammographic findings. Outcome data analysis is required for individual physicians as well as for the facility. Computerized tracking and analyzing system is acceptable and desirable but not required. FDA requires only determining that the biopsy is benign or malignant. Any case with a benign or negative assessment with a breast cancer diagnosis within a year, considered as false negative, should be analyzed.
The MQSA basic audit is likely to be expanded in the near future. The United States Congress has commissioned the Institute of Medicine [IOM] to produce a report to enhance quality of breast imaging practice [10]. The IOM report has conclude that the current requirements are inadequate for measuring or improving the quality of mammographic interpretation [10].
IOM Recommendations to Improve Interpretative Performance [10]
The institute of medicine in its manual on improving breast imaging quality standards has recommended carrying out studies to determine what additional approaches would improve the quality of mammography interpretation since the currently available data not sufficient to justify regulatory changes. Among the suggested studies to be undertaken are those that would demonstrate the efficacy of continuing medical education specifically dedicated to improving interpretive skills and effects of reader volume on interpretive performance, measuring the impact of double reading and computer-aided detection on interpretive performance over time and at different levels of experience and in different practice setting. The funding for such studies is recommended to be granted by the National Cancer Institute.
An outline of the recommendations appears in Box 10.3. The summary of these recommendations follows:
Box 10.3. Summary of Recommendations to Improve Breast Imaging Quality
1. Revise and standardize the required medical audit component of MQSA |
2. Facilitate a voluntary advanced medical audit with feedback |
3. Designate specialized Breast Imaging Centers of Excellence and undertake demonstration projects and evaluations within them |
4. Further study the effects of CME, reader volume, double reading, and CAD |
5. Revise MQSA regulations, inspections, and enforcement |
6. Modify regulations to clarify their intent and address current technology |
7. Streamline inspections and strengthen enforcement for patient protection |
8. Ensure an adequate workforce for breast cancer screening and diagnosis |
9. Collect and analyze data on the mammography workforce and service capacity |
10. Devise strategies to recruit and retain highly skilled breast imaging professionals |
11. Make more effective use of breast imaging specialists |
12. Improve breast imaging quality beyond mammography by mandating accreditation for nonmammographic breast imaging methods that are routinely used for breast cancer detection and diagnosis, such as ultrasound and magnetic resonance imaging (MRI) |
Include PPV2, cancer detection rate, and abnormal interpretation rate in the required basic medical audit.
In addition to tracking BI-RADS 4 and 5 assessments, all women for whom additional imaging has been recommended should also be tracked. [BI-RADS 0; incomplete assessment, needs additional imaging].
All performance measures should be measured separately for screening and diagnostic mammography.
Each interpreting physician should be allowed to combine audit data from all facilities that he or she is interpreting.
Encourage facilities to participate in a voluntary enhanced mammography audit that would collect data on patient characteristics and tumor staging information from pathology reports. This should be tied into a central data and statistical coordinating center that would collect data from interpreting physicians and provide feedback for quality assurance and improvement. Implementation of such an audit needs to be incentivized by tying in pay for performance by Centers for Medicare & Medicaid Services [CMS] and payors by providing higher reimbursement rates for those meeting performance criteria that are set by a group of experts and patient advocates and periodically updates. Exempting such facilities from FDA inspection of medical audit data is an additional incentive.
Given the fact that the current MQSA-required audit is bare bones, it is desirable for each breast imaging facility to perform at a minimum the BI-RADS basic audit. Unlike the USA, in countries where organized screening is in place, a more stringent audit is mandated by government regulatory bodies. Additionally audit results should be examined for the facility as a whole as well as for individual radiologists interpreting mammograms. There are several commercially available software programs that continually accumulate data and produce metrics at defined intervals. The lead interpreting radiologist should monitor metrics of his or her colleagues and initiate remedial measures if performance metrics falls significantly out of the expected benchmarks (Table 10.1).
Table 10.1
Mammography interpretative performance benchmarks for screening mammography
Measure | Minimal acceptable criteria |
---|---|
Sensitivity | <75 % |
Specificity | <88 % or greater than 95 % |
Recall rate | <5 % or greater than 12 % |
PPV2 | <20 % or greater than 40 % |
Cancer detection rate | <2.5 % per 1,000 screens |
Audits are meaningful when performed separately for diagnostic and screening mammographic examinations due to expected variation in outcomes [11, 12]. In an analysis of 51,805 mammographies where screening and diagnostic examinations were audited separately, expected outcomes for various mixes were calculated based on a known mix of 79 and 21 % in the study group. For a screening diagnostic mix of 90 and 10 %, compared to a 50–50 % mix, the expected rate of abnormal findings was 6–11 %, rate of positive biopsy findings was 38 % vs. 42 %, cancer detection rate was 10 per 10,000 to 30 per 10,000, invasive cancer size was 14.4 vs. 16.0 mm, nodal metastasis was 8–11 %, and rate of stage 0 and stage 1 cancers was 87 % vs. 82 %. Among diagnostic mammographic examinations, a higher percentage for all these numbers is expected for those with palpable findings [11]. Extrapolation from known outcomes is suggested when audit data for screening and diagnostic examinations are combined. As was shown in this study, the mix of screening and diagnostic, as well as the type of indication for a diagnostic examination, will influence the outcomes [11].
Mammographic Interpretation, Interpretive Accuracy, and Benchmarks
Benchmarks that are used to determine interpretive performance may be derived from expert panels or derived from published large samples of data from clinical practice. The introduction and implementation of MQSA has had the intended effect of improving the technical quality of mammographic examinations; however, there has not been a corresponding improvement in the interpretative quality of mammograms as judged by sensitivity and specificity [10].
Minimally acceptable criteria for interpretive performance for screening and diagnostic mammography have been published [11–16]. One of these studies examined minimally acceptable performance standards for interpreting screening mammograms: a sensitivity of less than 75 %, a specificity that was less than 88 % or greater than 95 %, a recall rate that was less than 5 % or greater than 12 %, PPV2 of less than 20 % or greater than 40 %, and cancer detection rate of 2.5 per 1,000 interpretations as indicating low performance (Table 10.1). If underperforming physicians moved into the acceptable range by additional training, detection of an additional 14 cancers per 100,000 women screened and a reduction in the number of false-positive examinations by 880 per 100,000 women screened would be expected [12]. Radiologists interpreting moderate (1,001–2,000) and those with high volume (>2,000) had a higher sensitivity [12].