Preoperative diagnosis and operative planning for patients with thyroid nodules has improved over the last decade. The Bethesda criteria for cytopathologic classification of thyroid nodule aspirate has enhanced communication between pathologists and clinicians. Multiple genetic tests, including molecular markers and the Afirma gene expression classifier, have been developed and validated. The tests, along with clinical and radiologic information, are most useful in the setting of indeterminate cytology. The development of an updated diagnostic and treatment algorithm incorporating all available tests will help standardize the management of patients with nodular thyroid disease and reduce variation and inefficiencies in care.
Key points
- •
The Bethesda system for reporting thyroid cytopathology provides a standardized method of reporting results from fine-needle aspiration of thyroid nodules.
- •
The Bethesda system for reporting thyroid cytopathology should be applied universally to improve communication between pathologists and clinicians.
- •
For patients with indeterminate cytology, the Afirma gene expression classifier and tests for genetic mutations may provide helpful diagnostic and prognostic information with which to optimize treatment plans.
Introduction: nature of the problem
Thyroid nodules are common. Reports in the literature suggest a prevalence of 4% to 76% in the general population, depending on the mode of detection and the population studied. Autopsy studies have reported similar prevalence rates. Increased use of ultrasound technology has led to increased detection of thyroid lesions over the last 30 years.
Most thyroid nodules are benign, with only 5% to 10% representing a malignant tumor. Evaluation for thyroid malignancy is necessary after the discovery of many thyroid nodules to determine appropriate management recommendations. Nodules greater than 1 to 1.5 cm in size or associated with abnormal lymph node(s) on ultrasonography should receive an ultrasound-guided fine-needle aspiration (FNA) to obtain a sample for cytologic evaluation of malignancy.
The clinical utility of FNA to guide management decisions occurs when results clearly indicate benign or malignant disease. However, FNA is not a perfect diagnostic tool. Variation in the interpretation of cytopathologic samples can occur between pathologists and across institutions, leading to controversy over management. Moreover, approximately 10% to 25% of FNA evaluations report “indeterminate” results whereby neither benign nor malignant disease can be declared. Consequently, thyroid lobectomy is needed to obtain tissue for an accurate diagnosis. However, more than half of these patients are found to have benign disease on formal pathologic evaluation, making their surgery seem unnecessary. For those with malignancy found on surgical pathology, a second operation is usually recommended to improve surveillance for recurrence and to permit additional treatment when necessary. The need for a second operation can be troubling for patients and challenging for surgeons.
Improved characterization of thyroid nodules is necessary to guide appropriate patients to surgery and reduce unnecessary surgeries. A universal cytopathologic classification system and several diagnostic tests have been developed in the last decade to facilitate the decision-making process.
Cytopathologic classification
Bethesda Classification System
Background
Before 2007, multiple classification systems existed describing the results of FNA of thyroid nodules. Discordance between these classification systems led to inconsistent reporting of FNA results, creating confusion among clinicians and limiting the effectiveness of the test. At the National Cancer Institute conference in the fall of 2007, a leading group of pathologists and clinicians proposed a 6-tiered classification scheme, known as The Bethesda System for Reporting Cytopathology (“the Bethesda system”). The goal of the system was to provide a consistent means of reporting clinically relevant information so that physicians could best advise patients ( Table 1 ).
Category | Includes | Risk of Malignancy (%) | Recommended Action | |
---|---|---|---|---|
I | Nondiagnostic/unsatisfactory | Result of limited cellularity, lack of follicular cells, or poor fixation and preservation of sample | 1–4 | Repeat FNA with ultrasound guidance |
II | Benign | Nodular goiter; hyperplastic or adenomatous nodule; chronic lymphocytic thyroiditis | 0–3 | Clinical follow-up |
III | Follicular lesion of undetermined significance/atypia of undetermined significance | Cases that cannot be classified as either benign or a follicular neoplasm | 5–15 | Repeat FNA, correlate with clinical and radiologic findings. Consider molecular marker testing |
IV | Follicular neoplasm/suspicious for follicular neoplasm | Includes nonpapillary follicular lesions and Hurthle cell lesions or neoplasms | 15–30 | Consider clinical and radiologic findings to determine between diagnostic lobectomy or near-total thyroidectomy |
V | Suspicious for malignancy | Suspicious for papillary carcinoma, medullary carcinoma, lymphoma, metastatic disease. May be due to presence of necrosis in specimen | 60–75 | Surgical lobectomy ± intraoperative frozen section to determine extent of surgery or near-total thyroidectomy for definitive diagnosis and treatment a |
VI | Malignant | Evidence of papillary carcinoma and its variants, medullary carcinoma, lymphoma, anaplastic carcinoma, metastases | 97–99 | Near-total thyroidectomy for definitive management b |
a Lateral neck ultrasonography should be performed preoperatively to examine for suspicious nodes and appropriate treatment determined pending results.
b Ultrasonography of entire thyroid gland and the cervical lymph node compartments should be performed preoperatively to examine for suspicious nodes and possible malignancy in contralateral lobe. Appropriate treatment is determined pending results.
The classification system
A description of each category follows, along with the clinical recommendations put forth by the Bethesda group. Similar information is available in Table 1 for ease of review. The chart is designed to be printed and posted in the surgical office setting, and can be especially helpful in an academic setting to raise awareness of the standard of care and encourage appropriate treatment decisions.
Class I: nondiagnostic or unsatisfactory
FNA samples may have blood obscuring the specimen, a thick smear, smears that are improperly dried, or an insufficient quantity of cells. The malignancy risk in nondiagnostic or unsatisfactory samples is 1% to 4%. Repeated aspiration of the nodule with ultrasound guidance should lead to a diagnostic result in 50% to 88% of cases; for nodules that remain nondiagnostic or unsatisfactory after repeat aspiration, excisional biopsy leads to a malignant result in 10%. Recommendation: Repeat FNA under ultrasound guidance is suggested.
Class II: benign
Thyroid FNAs are benign in 60% to 70% of cases. The false-negative rate of a benign result is 0% to 3%. Patients with a benign FNA should be followed clinically for 6 to 18 months, and repeat ultrasound, FNA, or both performed if clinical changes are noted. Recommendation: Ultrasonographic surveillance every 6 to 18 months to assess stability. Change warrants repeat FNA.
Class III: atypia of undetermined significance/follicular lesion of undetermined significance
This category encompasses lesions that are not readily categorized as benign, malignant, or suspicious. This result occurs in 3% to 6% of FNAs, and should prompt performance of a repeat FNA. Repeated FNA will result in categorization in 80% of aspirates, but 20% will remain classified as atypia of undetermined significance (AUS)/follicular lesion of undetermined significance (FLUS). Because all patients with AUS/FLUS do not receive surgical resection, determination of the rate of potential malignancy is difficult. However, in AUS/FLUS specimens with concerning features on ultrasonography or physical examination, 20% to 25% are found to harbor malignancy. The risk of malignancy for all AUS/FLUS specimens is likely lower. Recommendation: Repeat FNA under ultrasound guidance with consideration of molecular marker testing.
Class IV: follicular neoplasm/suspicious for follicular neoplasm
This category includes lesions that are concerning for follicular carcinoma, which cannot be diagnosed on FNA. Approximately 15% to 30% of these FNAs will be malignant on pathologic evaluation after surgical resection, diagnosed as either follicular carcinomas or follicular variants of papillary thyroid carcinomas. FNAs with a predominance of Hurthle cells may be reported as suspicious for Hurthle cell neoplasm, and 15% to 45% of these are malignant. Surgical resection is needed to make a definitive diagnosis following categorization of a sample as follicular neoplasm (FN)/suspicious for follicular neoplasm (SFN). Recommendation: Diagnostic lobectomy or total thyroidectomy, depending on the clinical scenario, is appropriate.
Class V: suspicious for malignancy
Aspirates with some characteristics of malignancy but not definitively malignant are grouped into this category. Surgical resection is needed to diagnose malignancy, which is found in 60% to 75% of cases. The most common diagnosis is papillary carcinoma, follicular variant. Recommendation: Surgical resection is indicated. Lobectomy with intraoperative frozen section or total thyroidectomy is appropriate.
Class VI: malignant
This classification describes samples that are definitively malignant, which occurs in 3% to 7% of FNAs, most of which are papillary thyroid carcinomas. Recommendation: Surgical resection is indicated.
Accuracy and efficacy of the Bethesda system
One stated goal of the Bethesda system was to reduce confusion between pathologists and clinicians, and provide a standardized approach to patient care. Meta-analysis of the literature on the accuracy of the Bethesda system found correlations between each diagnostic category and the risk of malignant disease, indicating that the diagnostic scheme is valid. In one study of patients with indeterminate lesions, implementation of the Bethesda criteria resulted in lower rates of malignancy in thyroidectomy specimens, suggesting that the Bethesda system has improved diagnostic accuracy. However, in that study the calculated rate of malignancy in specimens classified as benign was higher, at 3.7%. This figure exceeded the rate suggested as appropriate by the Bethesda group, but did fall within the American Thyroid Association guideline recommendations.
The effect of the Bethesda criteria on reducing unnecessary surgery is unclear: the number of patients receiving surgery after FNA varied greatly between institutions reviewed in the meta-analysis. However, a 2011 single-institution review found a lower rate of surgical resection after implementation of the Bethesda criteria. The decline in surgical resection was attributed to a decline in the rate of surgical resection among patients with benign findings.
A novel category introduced by the Bethesda system was AUS/FLUS. Despite the goal of reporting this result in only 3% to 6% of cases, a meta-analysis found this category used in 9.6% of cases and with an associated 15.9% malignancy rate. Sullivan and colleagues found heterogeneous reporting rates of AUS/FLUS between institutions, with some institutions reporting this categorization in 29% of aspirates. This variation may be due to the inability to precisely define morphologic criteria for atypia. Methods for limiting the overuse of this classification are under consideration. One suggestion is to implement the ratio of AUS/FLUS to malignant tumor as a performance metric, with a goal range of 1 to 3.
Although the Bethesda group recommended performing repeat FNA after an AUS/FLUS classification, management following reporting of AUS/FLUS is also variable: some patients receive repeat FNA, whereas others proceed to surgical resection. A 2010 review of thyroidectomy rates for indeterminate lesions at a single institution before and after implementation of the Bethesda criteria demonstrated no change in rates of thyroidectomy performance. This variation in practice may be due to initial hesitancy to follow a less aggressive (nonsurgical) approach despite the Bethesda system recommendations, or may result from differing published guidelines by national societies. For example, the 2009 management guidelines published by the American Thyroid Association recommend pursuing molecular markers (see later discussion) for aspirates classified as indeterminate. Of note, Heller and colleagues performed a cost-effective analysis on repeat FNA for AUS/FLUS lesions, and found repeat FNA to be less costly and more effective than diagnostic lobectomy.
Overall, the Bethesda classification has improved communication between providers and across institutions that adopted its use. Moving forward, establishment of additional criteria to help resolve indeterminate results is under way to further encourage the selective use of surgery as a diagnostic modality when not required for treatment purposes.
Cytopathologic classification
Bethesda Classification System
Background
Before 2007, multiple classification systems existed describing the results of FNA of thyroid nodules. Discordance between these classification systems led to inconsistent reporting of FNA results, creating confusion among clinicians and limiting the effectiveness of the test. At the National Cancer Institute conference in the fall of 2007, a leading group of pathologists and clinicians proposed a 6-tiered classification scheme, known as The Bethesda System for Reporting Cytopathology (“the Bethesda system”). The goal of the system was to provide a consistent means of reporting clinically relevant information so that physicians could best advise patients ( Table 1 ).
Category | Includes | Risk of Malignancy (%) | Recommended Action | |
---|---|---|---|---|
I | Nondiagnostic/unsatisfactory | Result of limited cellularity, lack of follicular cells, or poor fixation and preservation of sample | 1–4 | Repeat FNA with ultrasound guidance |
II | Benign | Nodular goiter; hyperplastic or adenomatous nodule; chronic lymphocytic thyroiditis | 0–3 | Clinical follow-up |
III | Follicular lesion of undetermined significance/atypia of undetermined significance | Cases that cannot be classified as either benign or a follicular neoplasm | 5–15 | Repeat FNA, correlate with clinical and radiologic findings. Consider molecular marker testing |
IV | Follicular neoplasm/suspicious for follicular neoplasm | Includes nonpapillary follicular lesions and Hurthle cell lesions or neoplasms | 15–30 | Consider clinical and radiologic findings to determine between diagnostic lobectomy or near-total thyroidectomy |
V | Suspicious for malignancy | Suspicious for papillary carcinoma, medullary carcinoma, lymphoma, metastatic disease. May be due to presence of necrosis in specimen | 60–75 | Surgical lobectomy ± intraoperative frozen section to determine extent of surgery or near-total thyroidectomy for definitive diagnosis and treatment a |
VI | Malignant | Evidence of papillary carcinoma and its variants, medullary carcinoma, lymphoma, anaplastic carcinoma, metastases | 97–99 | Near-total thyroidectomy for definitive management b |
a Lateral neck ultrasonography should be performed preoperatively to examine for suspicious nodes and appropriate treatment determined pending results.
b Ultrasonography of entire thyroid gland and the cervical lymph node compartments should be performed preoperatively to examine for suspicious nodes and possible malignancy in contralateral lobe. Appropriate treatment is determined pending results.
The classification system
A description of each category follows, along with the clinical recommendations put forth by the Bethesda group. Similar information is available in Table 1 for ease of review. The chart is designed to be printed and posted in the surgical office setting, and can be especially helpful in an academic setting to raise awareness of the standard of care and encourage appropriate treatment decisions.
Class I: nondiagnostic or unsatisfactory
FNA samples may have blood obscuring the specimen, a thick smear, smears that are improperly dried, or an insufficient quantity of cells. The malignancy risk in nondiagnostic or unsatisfactory samples is 1% to 4%. Repeated aspiration of the nodule with ultrasound guidance should lead to a diagnostic result in 50% to 88% of cases; for nodules that remain nondiagnostic or unsatisfactory after repeat aspiration, excisional biopsy leads to a malignant result in 10%. Recommendation: Repeat FNA under ultrasound guidance is suggested.
Class II: benign
Thyroid FNAs are benign in 60% to 70% of cases. The false-negative rate of a benign result is 0% to 3%. Patients with a benign FNA should be followed clinically for 6 to 18 months, and repeat ultrasound, FNA, or both performed if clinical changes are noted. Recommendation: Ultrasonographic surveillance every 6 to 18 months to assess stability. Change warrants repeat FNA.
Class III: atypia of undetermined significance/follicular lesion of undetermined significance
This category encompasses lesions that are not readily categorized as benign, malignant, or suspicious. This result occurs in 3% to 6% of FNAs, and should prompt performance of a repeat FNA. Repeated FNA will result in categorization in 80% of aspirates, but 20% will remain classified as atypia of undetermined significance (AUS)/follicular lesion of undetermined significance (FLUS). Because all patients with AUS/FLUS do not receive surgical resection, determination of the rate of potential malignancy is difficult. However, in AUS/FLUS specimens with concerning features on ultrasonography or physical examination, 20% to 25% are found to harbor malignancy. The risk of malignancy for all AUS/FLUS specimens is likely lower. Recommendation: Repeat FNA under ultrasound guidance with consideration of molecular marker testing.
Class IV: follicular neoplasm/suspicious for follicular neoplasm
This category includes lesions that are concerning for follicular carcinoma, which cannot be diagnosed on FNA. Approximately 15% to 30% of these FNAs will be malignant on pathologic evaluation after surgical resection, diagnosed as either follicular carcinomas or follicular variants of papillary thyroid carcinomas. FNAs with a predominance of Hurthle cells may be reported as suspicious for Hurthle cell neoplasm, and 15% to 45% of these are malignant. Surgical resection is needed to make a definitive diagnosis following categorization of a sample as follicular neoplasm (FN)/suspicious for follicular neoplasm (SFN). Recommendation: Diagnostic lobectomy or total thyroidectomy, depending on the clinical scenario, is appropriate.
Class V: suspicious for malignancy
Aspirates with some characteristics of malignancy but not definitively malignant are grouped into this category. Surgical resection is needed to diagnose malignancy, which is found in 60% to 75% of cases. The most common diagnosis is papillary carcinoma, follicular variant. Recommendation: Surgical resection is indicated. Lobectomy with intraoperative frozen section or total thyroidectomy is appropriate.
Class VI: malignant
This classification describes samples that are definitively malignant, which occurs in 3% to 7% of FNAs, most of which are papillary thyroid carcinomas. Recommendation: Surgical resection is indicated.
Accuracy and efficacy of the Bethesda system
One stated goal of the Bethesda system was to reduce confusion between pathologists and clinicians, and provide a standardized approach to patient care. Meta-analysis of the literature on the accuracy of the Bethesda system found correlations between each diagnostic category and the risk of malignant disease, indicating that the diagnostic scheme is valid. In one study of patients with indeterminate lesions, implementation of the Bethesda criteria resulted in lower rates of malignancy in thyroidectomy specimens, suggesting that the Bethesda system has improved diagnostic accuracy. However, in that study the calculated rate of malignancy in specimens classified as benign was higher, at 3.7%. This figure exceeded the rate suggested as appropriate by the Bethesda group, but did fall within the American Thyroid Association guideline recommendations.
The effect of the Bethesda criteria on reducing unnecessary surgery is unclear: the number of patients receiving surgery after FNA varied greatly between institutions reviewed in the meta-analysis. However, a 2011 single-institution review found a lower rate of surgical resection after implementation of the Bethesda criteria. The decline in surgical resection was attributed to a decline in the rate of surgical resection among patients with benign findings.
A novel category introduced by the Bethesda system was AUS/FLUS. Despite the goal of reporting this result in only 3% to 6% of cases, a meta-analysis found this category used in 9.6% of cases and with an associated 15.9% malignancy rate. Sullivan and colleagues found heterogeneous reporting rates of AUS/FLUS between institutions, with some institutions reporting this categorization in 29% of aspirates. This variation may be due to the inability to precisely define morphologic criteria for atypia. Methods for limiting the overuse of this classification are under consideration. One suggestion is to implement the ratio of AUS/FLUS to malignant tumor as a performance metric, with a goal range of 1 to 3.
Although the Bethesda group recommended performing repeat FNA after an AUS/FLUS classification, management following reporting of AUS/FLUS is also variable: some patients receive repeat FNA, whereas others proceed to surgical resection. A 2010 review of thyroidectomy rates for indeterminate lesions at a single institution before and after implementation of the Bethesda criteria demonstrated no change in rates of thyroidectomy performance. This variation in practice may be due to initial hesitancy to follow a less aggressive (nonsurgical) approach despite the Bethesda system recommendations, or may result from differing published guidelines by national societies. For example, the 2009 management guidelines published by the American Thyroid Association recommend pursuing molecular markers (see later discussion) for aspirates classified as indeterminate. Of note, Heller and colleagues performed a cost-effective analysis on repeat FNA for AUS/FLUS lesions, and found repeat FNA to be less costly and more effective than diagnostic lobectomy.
Overall, the Bethesda classification has improved communication between providers and across institutions that adopted its use. Moving forward, establishment of additional criteria to help resolve indeterminate results is under way to further encourage the selective use of surgery as a diagnostic modality when not required for treatment purposes.