4 Introduction to Medical Biometry
Introduction
Contrary to a commonly held belief, medical biometry does not only pertain to the evaluation of data, but above all, has to do with the methodological aspects of research. These include the planning of trials, their interpretation, and methodological criticism of the study results. In recent years it has increasingly been concerned with meta-analyses of trial outcomes, a necessary component in evidence-based medicine. These tasks have more to do with theoretical reasoning about problems related to medicine than merely with mathematical or statistical procedures.
In this contribution emphasis is placed on questions concerning the methodology of therapeutic research in oncology. A specific focus is given to those aspects that play a role in the evaluation of efficacy of conventional as well as complementary therapies.
The Setting of Therapeutic Research in Oncology
The general epistemological principles that govern medical research also apply to the field of oncology. However, clinical trials in oncology take place in a very special research environment which in turn affects the details of their planning and execution.
• Cancer is a serious, progressive disease that leads to death if left untreated.
• Cancer is a common disease.
• In many instances, the available therapies are of limited efficacy.
• Most of the conventional therapies have many side effects.
• Most of the established therapies are expensive.
• Disease prognosis depends on many factors, especially the initial therapy (operation) and the quality of diagnosis.
Special Features of Research in Cancer Treatment
The circumstances outlined above have a multitude of consequences that characterize the current situation of therapeutic research in oncology:
• A large number of clinical trials are undertaken. The total number of trials published to date ranges in the tens of thousands. As can be seen in registries of clinical trials actively recruiting patients is in the order of several thousands. (http://cancertrials.nci.nih.gov, http://controlled-trials.com, www.studien.de).
• Oncologists often tend to move in a scientific environment and, for the most part, possess a lot of experience with clinical trials. This is the result of the enormous quantity of studies, many of which are multicenter trials involving numerous oncologists from large clinical centers. Even those oncologists not involved in these studies are required to keep up to date on the latest findings to ensure adequate patient care.
• Oncologists, at least those working in a conventional clinical setting, are mostly well acquainted with the methodological principles of well-designed trials. This simplifies the job for the biostatistical consultant since clinicians with a strong methodological background will more easily accept the-often inconvenient-methodological requirements imposed by a rigorous study design.
• Drug research is extremely expensive and is dictated by large pharmaceutical companies. The oncologist W.M. Gallmeier stated in 1994: “In the realm of clinical therapeutic research, industry determines what is done. It is industry that dictates the trends and themes, that chooses who is qualified to cooperate and which clinics are deemed adequate to participate. Questions that are not product-oriented can usually not be financed.” (37)
• Sponsors as well as patients and the public are interested in positive results. This carries an increased risk of publication bias or even manipulation of data.
• One should be skeptical about positive results derived from studies that did not involve careful monitoring and an external center of biometrics. It has been repeatedly noticed that, among several studies addressing the same question, those involving an external biometrical center produced far less spectacular results than the remaining studies. A rather tragic example in recent years was the treatment of breast cancer with high-dose chemotherapy.
• The great majority of comprehensive randomized trials yield null results with respect to survival time. This is simply because there are few true advances in therapy that result in a prolonged survival time. Again, the history of therapy of metastatic breast cancer offers a salient example (1, 25, 26, 36, 50).
• If there is no firm evidence that a drug extends survival or improves quality of life, approval by regulatory agencies may be given on the basis of antitumor activity (tumor response). O’Shaughnessy described the politics of the FDA the following way: “The main goal of cancer treatment is the prolongation of life, but the proof that a new drug results in a reduction in tumor growth and an improvement in quality of life in patients can support the approval of a new substance.” (62)
• In oncology there is always the danger of construing dogmas and of reaching epistemological deadlocks. This danger is present throughout medicine in general, and is not limited to therapy alone, but can be especially pronounced in life-threatening diseases. For a methodological analysis of this phenomenon see reference 6. To put it simply, physicians who are convinced of the efficacy of a certain therapy will not be able or willing to withhold this from their patient, whether their belief is scientifically founded or not. A comparison with nontreated controls can no longer be undertaken, and the belief becomes a dogma.
• Currently there are numerous widespread interventions in oncology lacking any (sufficient) evaluation. This holds true for the many untraditional therapeutic approaches in oncology (59, 71, 76, 78), but also, to a lesser extent, for conventional treatment options. Chemotherapy for metastatic ovarian or breast carcinoma was never compared with a no-treatment control. In both cases, it seems impossible to recover the lacking evidence (compare to preceding point).
• Many patients are prepared to accept considerable side-effects of treatment for even a tiny promise of hope. Many patients accept serious adverse effects for a little bit of (hypothetical) benefit. Eight percent of patients in a group with operable breast cancer when queried opted for high-dose chemotherapy in the event of metastasis, even assuming that this treatment would extend survival by merely one month (55, 70).
• In clinical trials, even small effects are of interest. On the individual level, a small benefit is of interest because of the life-threatening situation. On the population level, small effects may be important because of the large incidence and prevalence of cancer.
• In many situations, studies are necessarily large but possible (even if expensive). The necessity follows from the fact that one cannot realistically expect major effects in most instances.
• Studies can normally only be conducted in large institutions. Only these are generally able to provide the adequate apparatus and personnel.
• Conventional nonrandomized studies are of very limited value. This at least holds true, as we will see, for simplistic historic examples of comparisons in which changes in the prognostic starting position can never be excluded.
• Blinded studies are almost impossible to conduct. The attempt to study therapeutic efficacy on quality of life is problematic. Given the characteristic side-effects and routes of administration of most cancer therapies, blinding is often impossible. However, subjective impressions (quality of life ratings) may obviously be biased if the study is not blinded.
• Clinical trials are rarely designed as equivalence studies. Of course, studies aimed at showing equivalence (or, at least, non-inferiority) of treatments do exist. This is especially the case when it is clear from the very beginning that the test therapy is not expected to show a distinct advantage with respect to efficacy and that one is even prepared to accept a small disadvantage as long as this is compensated by some other aspects (such as fewer side effects or other benefits for the patient). Nonetheless, it is relatively rare that such questions are addressed in oncology. The main interest is prolonging the patient’s life.
• Interim analyses are almost always necessary: Since advantages and disadvantages of the therapies are of vital importance to the patients involved in clinical trials, it is generally not ethical to postpone any evaluations of treatment efficacy until all patients have been recruited.
• In general, one-sided testing for treatment effects is inadequate. One-sided testing in the usual superiority studies comparing a new therapy to a standard regimen has consequences: If the therapy to be tested appears to be inferior, one is no longer allowed to clarify and publish whether this inferiority is treatment-related or merely a random effect. One-sided testing is generally only acceptable if inferiority of the tested therapy is biologically impossible.
In the context of what will be discussed below, and specifically the last two points just mentioned, the following should always be kept in mind: No matter how plausible the active principle of a cancer therapy may be, how strong the immediately observed antitumor effects are, and how harmless the side effects may appear, one should never exclude a priori the possibility that the therapy may actually shorten the patient’s life.
Examples of therapeutic procedures that shorten survival times according to meta-analyses of randomized studies include adjuvant therapy of non-small cell lung cancer using alkylating agents and postoperative radiation therapy of non-small cell lung cancer (61, 66).
Problems of Conducting Studies in Complementary Oncology
If one takes a closer look at the difficulties we have just described, it becomes evident why well-designed trials of complementary cancer treatments are infrequent. Often, proponents of complementary therapies are concerned about the “experimental situation” created by clinical trials and, in particular, by randomized allocation of patients to the treatment groups. Sometimes, they may be reluctant to do the study in the first place, fearing that a (well-founded) null-result might harm their interests. Sometimes it is the fear of a well-founded negative result. Another point to consider is that research in complementary oncology mostly does not take part within traditional university research centers. Therefore, research culture may be less developed in comparison to conventional medicine, in which high-quality publications play a decisive role in career advancement. Finally, it should also be taken into consideration that certain nonmedical procedures are not under any regulatory pressures.
On the other hand, there are multiple reasons for the dearth of well-designed studies that proponents of complementary procedures cannot be held accountable for. According to the experience of the author, it may be impossible to implement a study, even with the best intentions of all participants involved. Here, four major reasons will be given: For a more detailed discussion see reference 2:
• Research environment and infrastructure. Research outside of university settings usually does not have access to established collaborations with methodological centers. Also few, if any, students are available to keep the trial going by working with little or no pay. Many institutions applying complentary medicine are companies that have to count the work and time dedicated by members of the staff to a clinical trial as expenses. Additionally, the available equipment may be insufficient to meet the standards required by a protocol of a multicenter trial.
• Access to research funding. Sponsors of respective therapies are often small companies that are unable to come up with the necessary funding for carefully conducted studies. Access to public grant funding is difficult. Regulators of studies often harbor negative feelings toward these procedures, or they have doubts (sometimes well founded) that the trial will be conducted successfully at the applicant’s institution.
• Patient numbers. Well-designed studies are large, in particular when it comes to complementary therapies where expected treatment effects may be small. Patient numbers required for such trials can greatly surpass the capacities of the mostly humble and small institutions of complementary oncology.
• Fragmentation of therapeutic approaches. The therapeutic approaches in complementary oncology are extremely heterogeneous. Since doctors usually swear by their treatment, even if it is only marginally different from another approach, it will be difficult to unite a group of investigators from different institutions who are prepared to apply identical therapies. This makes multicenter trials difficult. However, heterogeneity of treatment is a problem for monocenter trials, as well, namely for the relevance of the study results: for if a treatment evaluated at one institution is only one of many variants used in other places, then a result will say little about these variants. From the point of view of an external sponsor, the study may not seem to be of much value.
Primary Outcome Measures in Therapeutic Trials in Oncology
A study is nothing else but structured observation and experience. Clinical studies aim at obtaining data on efficacy and tolerability of therapies under transparent, predetermined, circumstances. We will limit our focus to the aspect of efficacy.
The goal of therapies in oncology is ultimately the prolongation of survival time and the improvement in quality of life of patients. These are the most important primary endpoints in therapeutic studies, gathered through the use of adequate tools. Apart from that, there are other outcome measures that are used in therapeutic studies in oncology, of which the following are the most common:
• tumor response rate and duration of tumor response
• immune response (rate)
• disease-free interval/survival
• progression-free interval/survival
• local recurrence-free interval/survival.
The two categories “disease-free interval” and “disease-free survival” differ in one point: the latter accounts not only for tumor recurrences, but also for deaths, both of which are counted as failures. The same is true for the last two outcome measures listed above.
Which endpoint is most relevant in a therapeutic trial depends on the type of cancer, the stage, and the type of therapy and questions posed by the study. It should always be realized, however, that increase in quality of life and survival time are ultimately of predominant importance to the patient, and all other outcome measures are secondary efficacy parameters.
Some of the methodological problems of the efficacy criteria will now be discussed.
Survival Time and Quality of Life
While survival time does not present any specific problems, the outcome measurement quality of life does prove problematic, starting with its definition. The difficulties of defining quality of life are similar to those encountered when defining “intelligence”: most people have only intuitive and nebulous ideas as to its meaning, and are unable to give a precise definition. As is the case with “intelligence,” one is forced to content oneself with operational definitions, in which quality of life is defined by the tools and instruments used to measure it.
In Table 4.1, some of these tools are listed. Two types should be differentiated:
• One-dimensional scales, based on clinical observations that may be precise and comprehensible, but only serve to show one aspect of a patient’s complex well-being.
• Multidimensional instruments in the form of questionnaires that attempt to capture the diverse aspects (such as symptoms of disease, physical functioning, moods etc.) of a patient’s condition.
Table 4.2 lists the main methodical problems of the measurement of quality of life. Thus, in case of multidimensional instruments it may be difficult to determine which items should be included, and which scales and weights should to be used for the items. It is, for instance, not clear whether it is appropriate to aggregate the components “hope,” “pain,” and “family well-being” in a single number and, if so, whether they should be given the same weight.
Clinical observations
• Karnofsky Index (Performance Index)
• Toxicity (according to WHO criteria)
• TWIST (Time WIthout Symptoms of disease and subjective Toxic effects of treatment)
Self-report by the patient based on questionnaires
• Spitzer Index, composed of five items: “activity, daily living, health, support, outlook” rating from 0–2 depending on a fixed set of answers
• Functional Living Index: Cancer (linear analogue scales)
• Breast Cancer Chemotherapy Questionnaire
• Cancer Rehabilitation Evaluation System
• Ferrans and Powers Quality of Life Index: Cancer Version
• Southwest Oncology Group Quality of Life Questionnaire
• Personal Functioning Index
• EORTC QLQ-C30 (EORTC = European Organization for Research and Treatment of Cancer) (74)
A further methodical problem is the timing of measurement that does not necessarily reflect the actual dynamics of quality of life. Of course, patients’ responses will be strongly influenced by the most recent impressions. That is why study outcomes depend largely on timing of questionnaires. In a chemotherapeutic study, for example, it will have an impact whether the questioning is done at the start or the end of a cycle (1).
The last three points in Table 4.2 concern possible biases when measuring quality of life. The most pertinent bias occurs when the patient answers— completely unaware of the fact that he is doing so— according to what he assumes the doctor wants to hear. This has nothing to do with placebo effects, and it should not be confounded with the improvements of quality-of-life ratings due to expectation of long-term cure. (Note that the latter effect also poses a methodological problem, in particular in studies involving a no-treatment control group.) These biases are impossible to evade, unless the study is blinded, but blinded studies, as we have mentioned before, are rarely performed in oncology.
• Choice of variables
• Scaling of variables
• Weighing and aggregation of variables
• Validity and reliability of instruments
• Timing of inquiry into health condition
• Influence of physician on patient response behavior
• Bias as a result of dropping out (death) of patients with particularly low quality-of-life measurements
• Bias as a result of nonresponse
The deaths of patients can bias the true picture of the chronological development of quality of life and it can lead to biased comparisons of the groups of a clinical trial. Usually, cancer patients who die mostly had very low quality of life ratings prior to their death. Their death (and removal from their treatment group) will, therefore, lead to an immediate improvement of the average score of their group, which is surely an unwanted consequence that leads to inaccurate portrayal of the results of a trial.
In a study of 587 patients with advanced lung cancer, the quality of life was shown to improve due to death and, thus, exclusion from the trial (41).
It is not hard to imagine how failure to respond to a question, or the entire questionnaire, will lead to bias in the evaluation, since compliance or ability to respond will likely depend on the state of the patient’s health.
These points are meant to be only a brief introduction into the problems of quality-of-life measurement. For a more detailed discussion, including methods developed for eliminating or mitigating the sources of bias, the reader is referred to the following articles: Oncology 5, No. 4, 1990; Controlled Clinical Trials 18, No. 4, 1997; Statistics in Medicine 17, Nos. 5–7, 1998.
The Problem of Secondary Efficacy Parameters
Tumor response will be discussed as an example of a secondary efficacy parameter (surrogate variable). It serves as the main measure of outcome in phase II trials, most of which consist of only one treatment group, that try to evaluate the antitumor activity of a therapy. When discussing the benefits of cancer therapy this particular outcome measure is not very useful; it represents a soft variable. Observed response rates depend on the type and quality of diagnosis, and show considerable oberserver variability. They are also not safe from attempts at embellishment. In comparative studies, the assessment of tumor response should therefore be blinded, i.e., the judgment should be made without knowing the patient’s treatment assignment. This is especially important in those cases where response rate forms the basis for deciding whether the therapy should be used as a routine treatment.
The response rate depends on a number of other influencing factors (34). For one and the same type of treatment, the published response rates often show an enormous variability (1, 50). Therefore, nonrandomized comparisons, especially comparisons with rates that have been published, are of little value.
From the methodological point of view, the following question is of central importance: “Is it possible to causally relate the different response rates of cancer therapies to differences in efficacy regarding survival or quality of life?”
In neither instance does this hold true. It is clear that response rate by itself cannot predict quality of life, since potential reduction in pain symptoms due to tumor regression must be weighed against therapeutic side effects.
Ever since the 1980s, oncologists have attempted to correlate differences in response rates to improved survival time. Their line of argument was simple: chemotherapies can induce regression of malignancies, and without exception studies clearly show that responders live longer than non-responders. These two findings were used as definite proof for the life-prolonging effects of certain therapies. This deduction was certainly also responsible for some types of cancer never being tested in randomized comparative trials with untreated controls.
Doubt concerning this argumentation probably developed due to two disturbing observations: Firstly, there were several studies in which the chemotherapy group was compared with a non-treated control group without any effect on survival time. Secondly, when looking at innumerable randomized comparative trials of aggressive treatment regimens (e. g., comparing two different dose intensities of the same cytostatic drug or comparisons of combination chemotherapy vs. monotherapy) it became evident that the aggressive regimens almost always correlated with a higher response rate, without any visible effect on survival time (36)