5 Sarah Brown The designs described within this chapter specifically address the question of treatment selection, that is, randomisation to multiple experimental treatment arms is incorporated. It is, however, also possible to consider treatment selection using single-arm or randomised phase II designs described in Chapters 3 and 4. In this respect the aim is to show that each experimental treatment has sufficient activity (and tolerability, if appropriate) before performing treatment selection. Treatment selection from those experimental arms found to be sufficiently active (and tolerable if appropriate) may then take place, for example, using selection designs such as those described by Sargent and Goldberg (2001) or Simon et al. (1985) (see Section 5.2.1 for further details). These designs select the most active treatment with a pre-specified probability of correct selection, according to the difference in activity observed between the experimental arms. Such an approach, combining these selection designs with other phase II designs, ensures that the treatments considered for selection have already passed pre-specified minimum activity criteria (and possibly tolerability criteria), prior to selection. Steinberg and Venzon provide an example of such an approach, as described in Section 5.2.2 (Steinberg and Venzon 2002). The efficiency of such an approach, as compared with the alternative treatment selection designs described within this chapter, should be considered in further detail on a trial-specific basis. The designs within this chapter are organised as follows. First, designs including a control arm are described in Section 5.1, organised by design category and by outcome measure distribution. Second, in Section 5.2, designs that do not include a control arm are presented, again by design category and by outcome measure. Treatment selection designs that incorporate both activity and toxicity are presented separately in Section 6.4. No references identified. No references identified. Whitehead and Jaki (2009) Whitehead and Jaki propose one- and two-stage designs for phase II trials based on ordered category outcomes, when the aim of the trial is to select a single treatment to take forward to phase III evaluation. The design is randomised to incorporate a formal comparison with a control arm, and hypothesis testing is based on the Mann–Whitney statistic. The treatment identified with the smallest p-value indicating a treatment effect is selected as the treatment to take forward for further investigation. Details of sample size and critical value calculation are provided, and R code is noted as being available from the authors to allow implementation. Specification of the worthwhile treatment effect and the small positive treatment effect that is not worth further investigation are required to be specified. No references identified. No references identified. Jung (2008) Jung proposes a randomised controlled extension to Simon’s optimal and minimax designs (Simon 1989), considering a binary outcome measure and incorporating early termination for lack of activity. The experimental arms are compared with the control arm at the end of stage 1 and treatments may be dropped for lack of activity. More than one experimental arm may therefore be taken forward to stage 2. If no treatments show improved activity over the control arm at the end of stage 1 the trial may be terminated for lack of activity. At the end of stage 2, all arms that pass the stage 2 cut-off boundaries compared to control are deemed worthy of further investigation. The selection design is an extension to the design described comparing a single experimental arm with a control. In the selection design the family-wise error rate, the probability of erroneously accepting an inactive treatment, is controlled. Programs to identify designs are available upon request from the author. Jung and George (2009) Jung and George propose methods of comparing treatment arms in a randomised phase II trial, where the intention is either to select one treatment from many for further evaluation or to determine whether a single treatment is worthy of evaluation compared to a control. The phase II design is based on a k-armed trial (with or without a control arm for selection) with each arm designed for independent evaluation following Simon’s two-stage design (Simon 1989), or similar, based on historical control data, that is, no comparison is made with the control arm at this stage. Different designs (i.e. the same two-stage design but with different operating characteristics) may be used for different arms in the independent evaluation if deemed necessary. A treatment must be accepted via the independent evaluation before it can be considered for selection, at which point comparisons may be made with the control arm. p-Values are calculated to represent the probability that the difference between the arms being compared is at least some pre-defined minimal accepted difference, given the actual difference observed. The outcome measure used to select the better treatment is the same outcome measure used for evaluation of each arm independently, for example, tumour response. No software is detailed; however, detail is given which should allow implementation, and sufficient examples are also provided. The initial two-stage design can be calculated using software available for Simon’s two-stage design. Levy et al. (2006) Levy et al. propose a randomised two-stage futility design incorporating treatment selection at the end of the first stage. At the end of the first stage the ‘best’ treatment is selected based on the treatment with the highest/lowest (‘best’) mean outcome, that is, no comparison with control is made here. Sample size for the first stage is calculated to give at least 80% probability of correct selection. Patients then continue to be randomised between control and the selected treatment, and data from the first stage is incorporated into the second-stage futility analysis, incorporating a bias correction. The null hypothesis is that the selected treatment reduces the mean response by at least x% compared to control; the alternative hypothesis is that the selected treatment reduces the mean response by less than x% compared to control (reflecting a futility design). Sample size and power calculation details are provided in appendices. Shun et al. (2008) Shun et al. propose a phase II/III or two-stage treatment selection design where a single treatment is selected from two at the end of the first stage. Randomisation incorporates a control arm, with the intention of formal comparison at the end of the second stage only, that is, no formal comparison for treatment selection. Treatment selection is based on the experimental treatment with the highest/lowest (‘best’) mean outcome. A normal approximation approach is proposed to avoid complex numerical integration requirements. The design assumes that the treatment effects of the experimental treatments are not the same. The practical approach to timing of interim analysis addresses the need to perform this early in order to avoid type I error inflation and the need to perform this late enough such that there is a high probability of correctly selecting the better treatment. No software is noted as being available; however, detail is provided to allow implementation and a detailed example is given. The authors note that this design can be extended to binary and time-to-event outcome measures if the correlation between the final and interim test statistics is known. Sun et al. (2009) Sun and colleagues propose a randomised two-stage design based on Zee’s single-arm multi-stage design with multinomial outcome measure (Zee et al. 1999), adjusting the rules such that a sufficiently high response rate or a sufficiently low early progressive disease rate should warrant further investigation of a treatment. Optimal and minimax designs are proposed following the methodology of Simon (1989), incorporating comparison with a control arm. Differences in response and progressive disease rates between control and experimental arms are compared. The authors note that the intention of the phase II trial is to screen for potential efficacy as opposed to identifying statistically significant differences compared with control. Patients are randomised between multiple experimental treatments and a control arm. At the end of the first stage only those treatments that pass the stopping boundaries for both response and progressive disease are continued to the second stage. If there is clear evidence that one treatment is better than the other, selection may take place at the end of the first stage. If, at the end of the second stage, there is no clear evidence that one experimental treatment is better than the other both arms may be considered for further evaluation. Detail is given regarding how to implement the designs in practice, and software is noted as being available by contacting the first author to allow identification of designs. The authors also note that the design may be extended to studies monitoring safety and efficacy simultaneously. Whitehead and Jaki (2009) Whitehead and Jaki propose one- and two-stage designs for phase II trials based on ordered category outcomes, when the aim of the trial is to select a single treatment to take forward to phase III evaluation. The design is randomised to incorporate a formal comparison with a control arm, and hypothesis testing is based on the Mann–Whitney statistic. In the two-stage design, treatment selection takes place at the end of stage 1 whereby the treatment with the smallest p-value indicating a treatment effect is selected as the treatment to take forward to stage 2. In stage 2, patients are randomised between the selected treatment and control only. The final analysis at the end of stage 2 is based on all data available on patients in the control arm and the selected treatment arm. Details of sample size and critical value calculation are provided, and R code is noted as being available from the authors to allow implementation. Specification of the worthwhile treatment effect and the small positive treatment effect that is not worth further investigation are required to be specified. No references identified. No references identified. No references identified. Cheung (2009) Cheung describes an adaptive multi-arm, multi-stage selection design incorporating a control arm and considering a normally distributed outcome measure. Two multi-stage procedures are proposed: an extension of the sequential probability ratio test (SPRT) with a maximum sample size and a truncated sequential elimination procedure (ELIM). The SPRT method allows early selection of a treatment when there is evidence of increased activity compared to control, whereas the ELIM procedure also allows early termination of arms for lack of activity. The proposed procedures are compared with single-arm trials and the ELIM procedure is recommended over these, incorporating sample size reassessment at interim analyses. Cohort sizes between interim assessments may range from 1 to 10 with little impact on the design’s operating characteristics. Sample size formulae are provided which will require implementing in order to identify the trial design. No references identified. No references identified. No references identified. No references identified. No references identified. No references identified. The designs outlined within this section incorporate the same primary outcome measure for phase II assessment as that used for phase III. Although this may be seen as a seamless phase II/III approach, in effect it reflects a phase III trial with an early interim analysis on the primary outcome measure. In this setting, consideration should be given to the most appropriate outcome measure to use for both the phase II and phase III primary outcome. It is rare that efficacy in the phase III setting could be claimed on the basis of, for example, a binary outcome; rather, a time-to-event outcome is usually required in phase III trials. Bauer et al. (1998) Bauer and colleagues outline a simulation program for an adaptive two-stage design with application to phase II/III and dose finding. Two outcomes may be considered, with one primary variable on which formal hypothesis testing is performed and the other for which adaptations at the end of the first stage may be based on. The outcomes may be binary or continuous, or a combination. The same primary outcome measure is used at each analysis. Simulation is required to identify the best design according to various operating characteristics and the performance of different designs. A program is detailed (the focus of the manuscript) to allow implementation, which is noted as being available on request from the authors. At the end of the first stage the stage 1 hypothesis is tested, generating a p-value p1. At the end of the second stage the stage 2 hypothesis is tested using only data obtained from patients in stage 2, generating a p-value p2. The overall hypothesis is then tested combining p1 and p2 using Fisher’s combination test (Fisher 1932). Application is given to phase II/ III, with treatment selection at the end of stage 1: if the p-value is significant that at least one of the treatments is superior then the treatment with the ‘best’ outcome is considered in phase III. The trial may also terminate early for efficacy at the end of stage 1 if the p-value is significant at the stage 2 significance level. Bauer and Kieser (1999) Bauer and Kieser detail a design that incorporates formal comparison of each of the experimental arms with the control at the end of phase II (as well as testing whether any of the treatments are superior to control). The same primary outcome measure is used in both phases II and III. A fixed sample size is used for phase II, however the phase III sample size can be updated adaptively at the end of phase II. Stopping at the end of phase II is permissible for either lack of efficacy or early evidence of efficacy. The design also allows more than one treatment to be taken forward to phase III. At the end of phase II the sample size may be re-estimated and the test statistics to use at phase III are determined, according to the number of treatments taken forward and the hypotheses to be tested. The decision criterion at the end of phase III is based on Fisher’s combination test (Fisher 1932) whereby the p-values from both phases are combined (as opposed to combining data from all patients). Simulation is required as detailed in Bauer et al. (1998), as above. Examples are given in the dose-finding setting and the authors note that the main advantage of this design is its flexibility and its control of the family-wise error rate. The design is similar to that detailed above (Bauer et al. 1998) with the exception that the current paper gives more detail relating to multiple comparisons between experimental treatments and control arm. When considering either of these two designs, it is advised that both papers be considered together since the software detailed in Bauer et al., above, is required to identify the design proposed here. Stallard and Todd (2003) Stallard and Todd propose a design whereby patients from phase II are incorporated in the phase III analysis, and treatment selection at the end of phase II is based on the treatment with the largest test statistic using efficient scores and Fisher’s information. A formal comparison is made between the selected treatment and control, and the trial may be terminated early for lack of efficacy or superiority at this stage. The type I error in the final phase III analysis is adjusted for the treatment selection in phase II. Overall sample size and phase II sample size are computed according to group-sequential phase III designs such as those described by Whitehead (1997). A computer program is noted as being available from the authors to calculate power for stopping boundaries, according to pre-specified group sizes. The authors note that the design is useful when one treatment is likely to be much better than the others at phase II, as opposed to taking multiple treatments to phase III. Consideration should also be given to the timing of the first interim analysis (i.e. phase II assessment). Too early and there is too little information, too late and there are too many patients enrolled and thus potentially wasted resources. Kelly et al. (2005) Kelly and colleagues propose an adaptation to the design proposed by Stallard and Todd (detailed above), such that more than one treatment may be selected at multiple stages within the phase II part of the trial. Treatments are evaluated for selection using Fisher’s information and an efficient score statistic which may be applied to continuous, binary and failure time data. p-Values are calculated at each stage for comparison of the best treatment with control. Only treatments within a pre-specified margin of the efficient score statistic of the best treatment are continued to the next stage, and all other treatments are dropped. Patients are randomised between control and each of the treatments under investigation at each stage. The trial may stop for efficacy or lack of efficacy at each stage. The example given is based on the use of the triangular test described by Whitehead (1997), which uses expected Fisher’s information to calculate operating characteristics. Wang and Cui (2007) Wang and Cui outline a design whereby patients are randomised to each of the experimental treatments under investigation and a control arm, using response-adaptive randomisation (the paper is written in the context of dose selection but could be applied to treatment selection). The allocation ratios are calculated based on distance conditional powers (i.e. the probability that the event rate for the treatment under investigation is larger than some pre-specified fixed rate, based on the observed data and the fact that some patients will not yet have had their outcome observed). The treatment to which most patients have been randomised is deemed the most efficacious at the end of the recruitment period. This selected treatment is then formally compared with the control treatment, forming the phase III comparison. This design uses binary outcome measures such as treatment response, for both the phase II treatment selection and the phase III formal comparison; although it is noted that continuous outcomes may be used. Simulation is required to investigate the design parameters, with sample size calculated based on the phase III comparison. The design may be implemented with the development of programs based on formulae provided. Bretz et al. (2006)
Treatment selection designs
5.1 Including a control arm
5.1.1 One-stage designs
5.1.1.1 Binary outcome measure
5.1.1.2 Continuous outcome measure
5.1.1.3 Multinomial outcome measure
5.1.1.4 Time-to-event outcome measure
5.1.1.5 Ratio of times to progression
5.1.2 Two-stage designs
5.1.2.1 Binary outcome measure
5.1.2.2 Continuous outcome measure
5.1.2.3 Multinomial outcome measure
5.1.2.4 Time-to-event outcome measure
5.1.2.5 Ratio of times to progression
5.1.3 Multi-stage designs
5.1.3.1 Binary outcome measure
5.1.3.2 Continuous outcome measure
5.1.3.3 Multinomial outcome measure
5.1.3.4 Time-to-event outcome measure
5.1.3.5 Ratio of times to progression
5.1.4 Continuous monitoring designs
5.1.5 Decision-theoretic designs
5.1.6 Three-outcome designs
5.1.7 Phase II/III designs – same primary outcome measure at phase II and phase III
5.1.7.1 Binary outcome measure
5.1.7.2 Continuous outcome measure

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

