The fragility index of randomized controlled trials in advanced/metastatic renal cell cancer

Highlights

•

The median FI of a/mRCC was 12.5, which implied that the trial outcomes would be reversed due to event changes in 13 patients.
•

P -values were negatively associated with the FI.
•

FI should be added as a supplement to the P -value to evaluate the robustness of the outcomes.

Abstract

Purpose

The fragility index (FI) has been applied as a supplement to the noncomprehensive P -values to assess the robustness of randomized controlled trials (RCTs). The objective of this study is to evaluate the statistical robustness of RCTs of advanced/metastatic renal cell cancer (a/mRCC) using the FI.

Materials and methods

RCTs related to a/mRCC published in the 4 highest-impact general medical journals and the 25 highest-impact urological journals between January 1, 2000, and December 31, 2023, were identified from PubMed database. The FI was calculated by using Fisher’s exact test. Spearman’s correlation analysis was conducted to assess potential correlates regarding FI.

Results

16 eligible RCTs were screened with a median total sample size of 654.5 (IQR, 461–847) and a median patients lost to follow-up of 14 (IQR, 3–23). The median FI was 12.5 (IQR, 8.5–27), suggesting that a switch in outcomes in only 13 patients would have reversed the significance of the trials. The number of patients lost to follow-up exceeded or equaled to the FI in 7 (44%) RCTs. P -values were negatively associated with the FI, while the number of patients lost to follow-up and patients enrolled were not statistically significant.

Conclusion

Not all RCTs associated with a/mRCC are as statistically robust as previously considered and should therefore be construed carefully. We suggest that additional reporting of FI in urological RCTs as a supplement to the P -value to assist readers in concluding reliably by considering the fragility of the outcomes.

1 Introduction

Renal cell carcinoma (RCC) is one of the 3 most common urological malignancies, representing nearly 90% of all malignant renal tumors [ ]. Statistics from a global cancer survey in 2020 showed that the incidence and mortality of kidney cancer were 431,288 and 179,368, respectively [ ]. Besides surgery, embolization, surveillance and ablative therapy in localized RCC, systemic treatments for advanced/metastatic renal cell cancer (a/mRCC) including surgery combined with adjuvant, targeted or immunotherapy are recommended [ ]. Nevertheless, no consensus has yet been reached [ ]. The choice of treatment protocol was typically determined by the clinicians or directed by evidence-based medicine (EBM). EBM highlights the significance of the outcomes of randomized controlled trials (RCTs) in steering therapeutic decisions, which were considered as the gold standard for the comparison of clinical interventions.

RCTs, which are assumed to provide highly reliable evidence if proper methods are administered, are aimed at evaluating the safety and efficacy of interventions. P < 0.05 has been conventionally applied to recognize whether there is a difference between interventions in RCTs [ ]. Clinicians regularly depend on the P -values presented to explain outcomes and to confirm the significance of RCTs. Nevertheless, the threshold P -value has been heavily critiqued as an excessively straightforward concept in establishing statistical significance [ ]. The outcomes may be sharply reversed due to minor differences in P -values once the P -values are distributed exactly on the left and right sides around the thresholds. For example, one is 0.049 and the other is 0.051. Moreover, any P -value within the threshold is called “statistically significant” regardless of the size of the P -value. Consequently, there are limitations to relying on P -values alone to determine the differences of outcomes in trials.

Based on the above, the fragility index (FI) has been suggested in recent years to supplement the limitations of the P -values and to evaluate the robustness of the statistical significance of the RCTs [ ]. FI was defined as the minimum number of patients reversals (from a nonevent to an event) required to convert a statistically significant result to a nonsignificant result ( Fig. 1 ). The smaller the number is, the more fragile the statistical outcome of the event is. Recently, FI has been used extensively in medical RCTs, encompassing the treatments of various tumors. However, the median values of FI among them vary. For example, the statistics show that the median FI is 5 in hepatocellular carcinoma (HCC) [ ], −2 in cholangiocarcinoma [ ], 2 in rectal cancer [ ] and 28 in common solid tumors (including lung, breast, prostate and colon cancer and melanoma) [ ].

For all we know, there are no studies that have analyzed the fragility of statistical differences in RCTs concerned with RCC till now. The purpose of our study is to evaluate the statistical robustness of RCTs for the treatment of a/mRCC by utilizing FI.

2 Materials and methods

2.1 Search strategy

In addition to 4 high-impact general medical journals including The Lancet, The New England Journal of Medicine (NEJM), The British Medical Journal (BMJ), and The Journal of the American Medical Association (JAMA), we used InCites Journal Citation Reports to identify the top 25 highest impact urological journals. RCTs related to a/mRCC published in the above journals were systematically searched in PubMed using the MeSH terms for “randomized controlled trials” and “renal/kidney cell cancer/carcinoma” along with synonymous keywords from January 1, 2000, to December 31, 2023, publishing language set to English.

2.2 Eligibility criteria

The following inclusion criteria was included: (1) 2 parallel arms or two-by-two factorial design RCTs (randomized 1:1) related to a/mRCC, (2) at least one of the primary or secondary endpoints with a statistically significant outcome in the dichotomous trials ( P < 0.05 or a 95% confidence interval that excluded the null value), (3) involved human participants. The exclusion criteria covered case reports, meta-analyses, systematic reviews, cadaveric studies, animal studies, in vitro studies, and nondichotomous data.

2.3 Study selection and data extraction

The titles and abstracts of each article were screened by 2 independent reviewers to confirm that the literature for initial inclusion met the eligibility criteria. We have excluded studies on the diagnostic, cost-effective, non-a/mRCC, without specific numbers of patients in the 2 arms and investigation of baseline factors for outcomes. All included studies were on the comparison of treatments. The full text of the initially included studies was subsequently reviewed for final inclusion. Any disagreements between the 2 reviewers were resolved by a third reviewer.

We gathered the following variables for each trial that ultimately met the inclusion criteria: authors, publication year, journal name, sample size of patients enrolled, arms and number of patients per arm, number of patients lost to follow-up, P -value recalculated, efficacy outcomes and whether outcomes were primary or secondary.

2.4 Application of the fragility index

In case of more than one outcome variable was reported to be eligible for the calculation of FI in the studies, we gave priority to the primary outcome. If the primary outcome was not eligible, we then considered the most significant secondary outcome according to the Grading of Recommendations Assessment, Development and Evaluation guideline (GRADE) [ ]. We described trials in which the number of patients lost to follow-up was greater than or equal to FI as statistical fragility group, while vice versa as statistical robustness group. We calculated the FI for each trial utilizing a two-by-two contingency table based on the methodology presented by Walsh et al. [ ]. The P -value for each outcome was firstly recalculated by using a 2-sided Fisher’s exact test. We then repeatedly added the number of events in the arm with the lower number of events while simultaneously subtracting the number of nonevents to maintain the total number of patients unchanged. The number of additional events that when the P -value exactly changed from< 0.05 to ≥ 0.05 by Fisher’s exact test represented the FI. If the recalculated P -value was greater than or equal to 0.05 without change in the event, the FI was reported as zero.

2.5 Statistical analysis

SPSS software (version 23; IBM, Armonk, NY, USA) and Microsoft Excel 2016 (Redmond, WA, USA) were used for data analysis. Agreement amongst reviewers was determined using the Cohen kappa coefficient. The interpretation of kappa values was performed according to the guidelines previously published by Landis and Koch [ ]: kappa values of 0.00 to 0.20, 0.21 to 0.40, 0.41 to 0.60, 0.61 to 0.80, 0.81 to 1.00 are considered to be slightly, fairly, moderately, substantially, and almost perfectly consistent, respectively. We utilized descriptive statistics to conclude the FI of the RCTs and determined interquartile ranges (IQRs) to contribute to the overall understanding and explanation of the reported variability and dispersion. Spearman correlation analyses were conducted to investigate the relationship between the FI and trial characteristics variables including trial sample size, number of patients lost to follow-up, and P -value. All significance tests were 2-tailed, with a P -value less than 0.05 considered significant.

3 Results

3.1 Study selection

The literature search identified 1158 potentially eligible studies. After excluding duplicate articles and screening titles, abstracts, and journals, 73 articles were eligible for full-text review. Finally, 16 RCTs related to a/mRCC were included after a review of full texts ( Fig. 2 ). The agreement between the 2 reviewers for eligibility was substantially consistent (kappa = 0.67).

3.2 Trial characteristics and outcomes

The characteristics of all included studies were summarized in Table 1 ( Supplementary Table 1 in details). The median total number of patients in the samples was 654.5 (IQR, 461–847). All included RCTs unequivocally reported patients lost to follow-up, with a median of 14 (IQR, 3–23). Among all the outcomes reported in the confirmed studies, 13 were primary outcomes (81%) [ ], and 3 were secondary outcomes (19%) [ ]. Reported P -values (a total of 19 P -values, including co-primary endpoints in 2 studies) for each outcome were less than 0.05 but greater than or equal to 0.01 for 5 (26%) [ , , , , ], less than 0.01 but greater than or equal to 0.001 for 4 (21%) [ , , , ], and less than 0.001 for ten (53%) [ , , , , , , ]. The treatment management could be categorized into comparisons between single agents of the same category (12.5%) (including mTOR inhibitors and tyrosine kinase inhibitors [TKI]), single agents of different categories (43.75%) (3 of which were comparisons with placebo), and 2-drug combinations vs. single agents (43.75%) (4 comparisons of immune checkpoint inhibitors (ICI)+TKI vs. TKI and 3 comparisons of ICI+ICI vs. TKI). The outcomes (a total of 19 outcomes, including co-primary endpoints with co-outcomes in 2 studies) reported in the trials were as follows: overall survival (OS), objective response rate (ORR), progression-free survival (PFS), disease-free survival (DFS), and the proportion of patients achieving an objective response.

Table 1

Characteristics and fragility analysis of included trials.

Authors	Year	Journal	Patients enrolled	Efficacy outcomes	Lost to follow-up	P -value	FI
Bernard et al. [ ]	2007	N Engl J Med	903	PFS	17	<0.001	20
Bernard et al. [ ]	2007	The Lacent	649	PFS	21	<0.001	34
Robert et al. [ ]	2007	N Engl J Med	750	PFS	35	<0.001	54
Powles et al. [ ]	2016	Eur Urol	49	PFS	2	<0.001	8
Toni et al. [ ]	2016	The Lacent Oncology	658	PFS	25	0.005	11
Robert et al. [ ]	2017	N Engl J Med.	821	OS	18	0.011	9
Robert et al. [ ]	2018	N Engl J Med	847	OS	3	<0.001	38
				ORR		<0.001	36
				PFS		NA	NA
Robert et al. [ ]	2019	N Engl J Med	560	PFS	13	0.022	4
Robert et al. [ ]	2019	Lancet Oncol	847	OS	3	0.002	18
				PFS		0.022	5
				The proportion achieving objective response		<0.001	25
Brian et al. [ ]	2019	The Lacent	362	PFS	4	0.044	1
Brian et al.	2019	The Lacent	915			P > 0.05
Thomas et al. [ ]	2020	Lancet Oncol	861	OS	34	0.009	10
Brian et al. [ ]	2020	Lancet Oncol	350	PFS	2	<0.001	10
T.K. et al. [ ]	2021	N Engl J Med	651	PFS	12	<0.001	20
T.K. et al. [ ]	2021	N Engl J Med	994	DFS	31	0.003	14
Robert et al. [ ]	2022	Lancet Oncol	651	PFS	15	<0.001	37
Gr¨¹nwald et al. [ ]	2023	Eur Urol	49	ORR	1	0.038	1

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Tags: Urologic Oncology Seminars and Original Investigations Volume 43 Issue 5

May 11, 2025 | Posted by drzezo in ONCOLOGY | Comments Off

Oncohema Key

Fastest Oncology & Hematology Insight Engine

The fragility index of randomized controlled trials in advanced/metastatic renal cell cancer

Highlights

Abstract

Purpose

Materials and methods

Results

Conclusion

1

Introduction

2

Materials and methods

2.1

Search strategy

2.2

Eligibility criteria

2.3

Study selection and data extraction

2.4

Application of the fragility index

2.5

Statistical analysis

3

Results

3.1

Study selection

3.2

Trial characteristics and outcomes

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

Oncohema Key

Fastest Oncology & Hematology Insight Engine

The fragility index of randomized controlled trials in advanced/metastatic renal cell cancer

Highlights

Abstract

Purpose

Materials and methods

Results

Conclusion

1

Introduction

2

Materials and methods

2.1

Search strategy

2.2

Eligibility criteria

2.3

Study selection and data extraction

2.4

Application of the fragility index

2.5

Statistical analysis

3

Results

3.1

Study selection

3.2

Trial characteristics and outcomes

Related posts:

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree