Interpretation of Clinical Research Results as They Apply to Daily Practice

INTRODUCTION

Investigators have been evaluating the role of radiation therapy in the management of gynecologic cancers for more than 100 years. However, despite an extensive body of literature and, in some cases, decades of focused study, many key questions and controversies remain unresolved.

Clinical application of the gynecologic radiation oncology literature can be particularly challenging for several reasons, including:

Inconsistent methods that have been used to describe and classify tumors. Variations, ambiguities, and misapplications of tumor descriptors have often made it difficult to generalize the results of studies to other settings.
Complex, rapidly evolving treatment methods. Variations in method and technique may limit the relevance of results to different clinical settings.
Cancers that either are rare or have low event rates. Small numbers of patients limit the statistical power of many retrospective studies and make it difficult to design and complete adequately powered prospective trials.
Widely varying disease presentations within individual studies. Studies that have very heterogeneous study populations can yield overall conclusions that overestimate or underestimate the benefit of specific treatments to distinct subsets.

No matter how high the quality of the underlying study, relative risks, survival rates, and complication probabilities can never tell us with certainty which treatment will most benefit any individual patient. Nevertheless, a thorough understanding of the literature, including the strengths and weaknesses of relevant studies, can increase the likelihood that clinicians will choose an effective approach and reduce the probability that they will select an ineffective or potentially harmful treatment.

Of course, a thorough discussion of problems in clinical data interpretation is beyond the scope of this book. Specific strengths and weaknesses of key studies are discussed in context in the site-specific chapters (Chapters 10,11,12,13 and 14). The purpose of this chapter is to highlight a few of the most important factors that should be considered when reading and evaluating the validity of the gynecologic oncology literature. Ambiguities caused by inconsistent nomenclature and poorly defined end points can occur in all types of clinical research; in the next two sections, these sources of error are discussed with specific examples from the gynecologic oncology literature. Subsequent sections will discuss some common sources of error in specific types of clinical research.

AMBIGUITIES CAUSED BY INCONSISTENT NOMENCLATURE

The relevance of any study’s results to specific daily practice situations hinges upon clinicians’ ability to relate the characteristics of study patients to those of patients entering their clinics. To make this possible, publications must describe all relevant patient, tumor, and treatment characteristics in unambiguous terms that can be easily understood and applied after the results are disseminated. Failure to do this can lead to misunderstanding and misapplication of the study’s results. In this section, we have discussed some of the ways in which ambiguous nomenclature has compromised the generalizability of gynecologic oncology studies.

Inconsistencies in the Use and Interpretation of Staging Systems

Most studies of gynecologic cancer treatment use FIGO stage as a criterion for eligibility. However, although FIGO stage categories are usually at least roughly correlated with outcome, they are generally very imprecise measures of recurrence risk. Inconsistent application of staging systems and evolving category descriptions can also be important sources of miscommunication. The following factors have limited the utility of FIGO staging systems as guides for determining prognosis and treatment.

Failure to Consider Important Prognostic Factors

Most FIGO staging systems consider only a few of the important prognostic variables. As a result, stage categories often include definable subgroups with widely varying prognoses. For example, stage IB2 can describe a 4.1-cm cervical cancer with negative nodes or a massive 8-cm cancer with gross paraaortic lymphadenopathy. Stage alone rarely provides sufficient information about the composition of study populations.

Ambiguous Category Descriptions

In some cases, stage categories depend on highly subjective assessments of disease extent. Wide variations in clinicians’ thresholds for deciding that a patient has, for example, parametrial or pelvic wall involvement from cervical cancers (Chapter 10) or submucosal involvement of vaginal cancers (Chapter 11) can lead to very large differences in the baseline risk profiles of study patients labeled with the corresponding stages. These variations confound comparisons between similarly staged patients treated in different settings.

Misapplication of FIGO Rules for Staging

Although FIGO has published specific rules for the staging of gynecologic cancers, these are rarely reproduced when the staging systems are quoted in secondary sources. Ignorance of FIGO rules frequently leads to erroneous stage assignments. For example, although FIGO rules do not allow most results of CT, MRI, or PET to influence the stage of cervical or vaginal cancers, this admonition is often ignored. As a result, patients who have extrapelvic metastases diagnosed by these methods may be inappropriately assigned an advanced stage; this error influences the apparent survival rates of patients with all stages. Misapplication of surgical staging criteria to nonsurgical patients is another common source of error (Chapter 10).

Variations in the Extent of Surgical Staging

Although current endometrial and vulvar staging systems are based in part on the findings of lymphadenectomy, many patients have only limited node dissections or even none at all. This variability causes uncertainties about the composition of study populations and is an important source of stage migration. Adding to this is the recent advent of sentinel node mapping, which allows the detection of very small metastases that are typically undetectable with standard histologic processing methods. Although the surgical staging systems for endometrial and vulvar cancer are technically not applicable to patients treated with initial radiation or chemotherapy, stage categories are routinely applied to such patients even though key features used in the staging system cannot be evaluated without initial surgery.

Problems Caused by Evolving Staging Systems

Modifications of the FIGO staging systems have, over time, been important causes of ambiguity and stage migration. Major changes in the endometrial (Table 13.1), vulvar (Table 12.1), and cervical (Chapter 10) staging systems have caused historical discontinuities in the language used to describe cancers. For example, since 1993, there have been two major revisions of the vulvar staging system (Table 12.1). Although most studies traverse at least one of these transitions, investigators frequently fail to explain how they have ensured that stage assignments were applied consistently throughout their study. Also, unless clinicians are aware of the timing and nature of changes in the methods used to classify patients, they can easily draw erroneous conclusions from older studies that described patients using a previous staging method.

For all of these reasons, FIGO stage alone is never a sufficient method for characterizing the makeup of study populations. Detailed descriptions of the morphologic and histologic features that influence treatment outcomes can greatly improve clinicians’ confidence in the comparability of staging methods and therefore in the applicability of study results to their patients. The generalizability of studies that describe cancers in terms of tumor stage without clearly stating how, when, and with what rules stage was assigned and without providing other measures of disease extent should be considered suspect.

Inconsistent Interpretation of Histologic Findings

All gynecologic cancers are classified according to their histologic appearance. For cancers that are treated with initial surgery, histologic measures of disease extent, including tumor size, depth of stromal infiltration, lymph-vascular space invasion, surgical margin status, and other features, are used to guide recommendations regarding adjuvant treatment. Although clinicians often assume that the terms used to define these features are universally understood and consistently applied, there is considerable evidence that pathologists vary widely in their interpretations of findings. This can hamper accurate application of study findings in several ways.

Evolving Histologic Criteria

Pathologists’ criteria for assigning grade and for diagnosing other measures of biologic aggressiveness evolve as research refines their understanding of clinicopathologic correlations. Because the recognition and adoption of these shifting definitions tend to be gradual, the meaning of key terminology is easily obscured. For example, for at least two decades
before FIGO’s 1988 revised recommendations for grading of endometrial cancer (Table 13.2), academic pathologists had already been changing their grading criteria, shifting toward relatively lower-grade assignments for endometrioid cancers and higher-grade assignments for aggressive histologic variants. Hendrickson et al.,¹ in their 1982 reexamination of histologic material from 361 patients diagnosed with endometrial cancer between 1959 and 1978, found that 99 (27%) no longer even met their criteria for the diagnosis of cancer; many other lesions were reclassified as lower or higher grade than was specified in the original pathology reports. It is likely that reexamination of most early prospective studies would reveal similar findings. Even after revised methods were codified by FIGO in 1988, pathologists’ applications of FIGO recommendations were gradual and often inconsistent.

Shifts in nomenclature compromise historical comparisons in ways that can be very difficult to detect. When terms are applied inconsistently, multivariate analyses may not be valid and can easily lead to erroneous conclusions, particularly if evolving treatment strategies parallel changes in histologic criteria.²

Retrospective reviews that are based on vintage pathology reports and obsolete language may be impossible to interpret in contemporary terms. Investigators should always carefully consider whether older terminology will be correctly understood in a modern context; if not, histologic material should, if possible, be reexamined and classified using contemporary criteria.

Inconsistencies in Contemporary Application of Histologic Terminology

Variations in pathologists’ use of key terminology can lead to misinterpretation of research results and inappropriate treatment in several ways:

Histologic criteria are frequently used to determine eligibility and to estimate event rates for patients entered on multiinstitutional trials. However, these estimates will be inaccurate if principal investigators base their calculations on diagnostic criteria that are not followed by pathologists in contributing institutions. The degree to which this can affect otherwise high-quality trials is illustrated by the PORTEC studies of endometrial cancer (Chapter 13).³^,⁴ Histologic grade, a determinant of eligibility for these trials, was obtained from the original reports generated by local community pathologists. After the trials were completed and initial results reported, expert review of the material revealed that local pathologists had overestimated the cancer grades in 40% of cases. As a result, the study populations had much lower than expected baseline risks for recurrence. Unknown diagnostic errors undoubtedly affect many other studies that have not had central pathology review. Investigators should always state whether expert pathologists reviewed the histologic diagnoses underpinning their study. If not, the generalizability of the results should be viewed with skepticism.
The results of research studies, even when based on expert pathology review, may be applied incorrectly by practicing clinicians if local decisions depend on the reports of less expert pathologists. Expert review of diagnoses made by general pathologists has revealed frequent discrepancies and even major errors in diagnosis.⁵ Close multidisciplinary collaboration and appropriate consultation with expert pathologists may improve diagnostic precision and increase the likelihood that study results will be replicated in other settings (Chapter 3).

Other Sources of Ambiguity

Diagnostic Methods

Wherever possible, authors should describe the criteria used to reach clinical diagnoses. For example, what diagnostic methods were used to identify patients with lymph node involvement? If tomographic imaging methods were used, what size or other criteria were used to make the diagnosis of probable involvement? How did the authors handle equivocal findings? How many cases were confirmed histologically and how were these selected?

Categorization of Continuous Variables

In some cases, the handling of continuous variables (e.g., laboratory results, tumor size, patient age, BMI) can have a major effect on the final result and conclusions. If numerical variables have a continuous relationship to the study end point, they should be summarized and analyzed in that form. However, this is rarely the case. A variety of methods have been used to categorize continuous variables.⁶ Investigators commonly choose the median or quartiles to define study groups. These methods may reveal correlations but typically reduce the power of variables to define risk. For example, cervical tumor diameter is commonly treated as a dichotomous variable. Virtually any cutoff point will demonstrate a positive correlation between tumor size and disease recurrence. However, this method obscures additional prognostic information obtained through more detailed subdivision according to size.⁷ Another approach is to focus on patients whose values are considered clinically significant by the investigator. However, other investigators and readers may not agree with the authors’ determination of clinical significance.

For large studies, statistical methods should be used to define categories with cutoff points that most completely describe the relationship between the study variable and end points. For smaller studies, cutoff points that have been validated using larger data sets of patients with similar diagnoses can be used and referenced.

Summary

The language of clinical staging, diagnostic imaging, and histologic description forms the basis for all communications between researchers and practitioners. The quality of those communications and the applicability of trial results depend critically upon the precision of that language. Adequate description of clinical variables also allows readers to evaluate
the degree to which selection bias has influenced the findings of a study.

For these reasons, clinical reports should always attempt to provide:

Tabulated descriptions of the clinical characteristics of the overall study group and of subgroups compared in the study.
The version of the staging system used (with a citation).
The source of clinical findings. Who performed the exams and how were the data recorded? How were conflicting descriptions resolved?
Whether and how stage was reassigned if some patients were initially staged using an outdated staging system.
A detailed description of the diagnostic methods used to evaluate important clinical variables such as lymph node involvement, tumor size and extent, and others.
A specific description of the source of histologic diagnoses. Was the diagnosis obtained from multiple sources, from a single pathologist or group of pathologists that reviewed the material at diagnosis, or from a reexamination of the material performed specifically for the authors’ study? In many cases, inclusion of a gynecologic pathologist as coauthor can be particularly helpful.
If the authors chose to categorize continuous variables (e.g., tumor size), how were the cut points chosen and validated?
A description of missing data elements and how they were handled in the final analysis.
A detailed discussion of possible inconsistencies in the nomenclature used in the study. This is particularly important when historical comparison groups are used.

Authors and readers of the clinical literature should never underestimate the importance of clear terminology and expert application of diagnostic criteria. Improvements in the clarity of diagnostic terms and greater attention to the specialized expertise of diagnosticians could undoubtedly yield major improvements in the generalizability of trial results and the outcome of patients.

EVALUATING THE NATURE AND VALIDITY OF TRIAL END POINTS

The NCI Dictionary of Cancer Terms defines a clinical trial end point as “an event or outcome that can be measured objectively to determine whether the intervention being studied is beneficial.” Trial results may be misinterpreted if:

The study end point(s) are not clearly defined.
A large number of patients are lost to follow-up before the end point is reached.
The end point is not measured objectively.
The relationship between the end point and treatment benefit is misinterpreted.

A number of factors lead investigators to select specific trial end points. Although overall survival is often considered the most important end point for gynecologic cancer trials, other end points may shed greater light on the cause of failure. In some cases, “surrogate” end points that have shorter event horizons than overall survival are used to determine whether a treatment is beneficial; although surrogate end points can give an answer more quickly, they may not be true indicators of benefit, particularly if their relationship to the primary end point has not been adequately validated. End points commonly used to evaluate gynecologic cancer treatments and possible pitfalls that arise from their use include the following:

Overall Survival

This end point, which requires only an assessment of patients’ survival, is often considered to be the best measure of treatment effectiveness. However, for most potentially curable gynecologic cancers, 5 years or more is needed to obtain mature survival data. This is a long time to wait for an answer; in low-resource settings, it is also a long time to maintain continual follow-up of study patients.

The ability to accurately and reliably record deaths also varies greatly with the trial setting; socioeconomic standards, migration patterns, customary practices regarding end-oflife care, national methods of record keeping, access to death records, and other factors affect access to survival data. For example, in Norway, which has low rates of emigration and a robust tumor registry system that is tied to national identification numbers, patients are very rarely lost to follow-up before their death. On the other hand, in many developing countries, 30% or more of patients may be lost to followup in the first year after treatment. In the United States, the death of most citizens and legal residents is eventually recorded in the Social Security Death Index. However, there is typically a 4- to 6-month delay before death information appears in the Index, a factor that can have a profound effect on preliminary data analyses. Also, patients who lack a Social Security number (illegal residents and individuals with temporary visas) may be difficult or impossible to track down after they stop coming to the hospital for treatment. At the very least, cases that are censored before reaching the study end point decrease the power to detect differences; if these losses are nonrandom, they can lead investigators to erroneous conclusions.

Relapse-free Survival (RFS) and Disease-free Survival (DFS)

According to the NCI Dictionary of Cancer Terms, RFS and DFS may be used interchangeably to describe the length of time after primary treatment that a patient survives without any signs or symptoms of cancer. Any cancer recurrence and death from any cause are scored as events.

RFS has been very popular in gynecologic oncology trials, primarily because it has a shorter event horizon than overall survival. A major disadvantage of this end point is its dependence on the method used to detect recurrence. Studies that use more frequent assessments and sensitive detection methods will tend to report relatively short RFS rates. Also, equally effective treatments (in terms of overall survival) may have recurrences detected at different times, either because the pattern of recurrence differs between compared treatments (e.g., pelvic RT vs. chemotherapy) or because one treatment (e.g., adjuvant chemotherapy) delays the appearance of residual disease but does not prevent recurrence.

It is also important to recognize that successful salvage treatments do not influence RFS rates. In rare cases, this can lead to erroneous conclusions about the efficacy of treatment. See the discussion of GOG-122 (Chapter 13, Fig. 13.4) for a possible example.

Only gold members can continue reading. Log In or Register to continue