Volume 1(5); Pages: 379-385, 2021 | DOI: 10.21873/cdp.10050
BIRTE J. WOLFF, JOHANNES E. WOLFF
BIRTE J. WOLFF1 and JOHANNES E. WOLFF2
1Swedish Covenant Hospital Department of Gynecology, Chicago, IL, U.S.A.
2AbbVie Inc, North Chicago, IL, U.S.A.
Correspondence to: Dr. med. habil. Johannes E. Wolff, MD, Ph.D., 1709 Napa Suwe Ln, Wauconda, IL 60084, U.S.A. Tel: +1 2242216093, Fax: +1 8479380660, e-mail: firstname.lastname@example.org
Received August 6, 2021 | Revised September 2, 2021 | Accepted September 6, 2021
Background/Aim: Diarrhea is among the most common adverse events in early oncology clinical trials, and drug causality may be difficult to determine. Materials and Methods: This is a systematic literature review of placebo arms of randomized cancer trials. Results: Anemia was reported in 95 of 127 placebo monotherapy cohorts. Publications involving healthy volunteers and cancer prevention studies reported lower frequencies than those with cancer patients. The average reported frequency of diarrhea grade 1 or higher among studies in cancer patients was 15%. The maximal reported frequencies for grades 1, 2, 3, 4, 5 were 56, 24, 6, 2, and 0%, respectively. Conclusion: When higher diarrhea frequencies than those are observed in treatment arms of clinical trials, then drug causality is likely.
The first human studies in early oncology are typically dose escalation trials without placebo control arms, conducted with cancer patients in whom multiple lines of treatment have failed. Adverse events observed in these trials may not reflect toxicity of the agent under investigation.
Diarrhea is a common symptom; in the general population it is often caused by infections or food intolerance. The symptom may also be a drug side effect and multiple mechanisms have been reported (1). The mediators of increased fluid volume or motility may be initiated by direct mucosal injury (2), dysbiosis (3), autoimmune mechanisms such as colitis (4), or cytokine release syndrome. The understanding of these mechanisms will guide the choices of supportive care. However, the first step is to determine if the symptom is actually caused by the drug. The frequency and severity of diarrhea in data aggregations of multiple patients may allow a causality assessment, but this can only be achieved if baseline data are available on the incidence of diarrhea in the patient population without the presence of the investigation drug.
The National Cancer Institute’s Common Terminology Criteria for Adverse Events (CTCAE) version 5 defines diarrhea as a disorder characterized by an increase in frequency and/or loose or watery bowel movements; and grades the symptom as: grade 1: “Increase of 4 stools per day over baseline; mild increase in ostomy output compared to baseline”, grade 2: “ Increase of 4-6 stools per day over baseline; moderate increase in ostomy output compared to baseline; limiting instrumental activity of daily living (ADL)”, grade 3: “Increase of ≥7 stools per day over baseline; hospitalization indicated; severe increase in ostomy output compared to baseline; limiting selfcare ADL”, grade 4: “Life-threatening consequences; urgent intervention indicated”, and grade 5: death. The “change over baseline” in this definition is an unusual component of the CTCAE grading; as it includes a component of causality assessment: temporality (5, 6). The concept and numeric limits have not changed in the various versions since their creation in 1983 (7), but there were minor changes of the specific wording: Version 4.03 lacked the reference to ADL in grade 2, and included the term “incontinence” in grade 3. Version 3 included IV fluids in Grade 2, and the term “hospitalization” in grade 3. Version 2 included the term “nocturnal stool” in grade 2 and did not have a grade 5 definition for any adverse event (AE).
The best source of data for the expected frequency of AEs in a cancer population without the investigational agent is placebo data from clinical trials. This analysis is part of a larger project, analyzing various AEs, and improving the methodology. Using the example of headache, a previous analysis showed a counterintuitive relation of more headache reported in healthier patients (described by ECOG status) (8). In contrast, for anemia, no influence of ECOG status was detectable (9). This analysis of anemia data used imputation of missing values when some but not all were reported within one patient cohort. The imputation was based upon a model, which assumed normally distributed hemoglobin. For the analysis of diarrhea presented here, the leading hypotheses was that diarrhea reporting was related to different demographic variables than the two other AEs; and therefore, bench marks for clinical trials would need to be defined in a different way. For this, the data collection was expanded, and the methods of imputation of missing values were developed further, also testing other models.
The selection of included publications was built upon two previous meta-analyses 2000-2018 (10), 2018-Nov 2020, (8) and 2020-March 2021 (9). In addition, for this analysis, a thorough search for the year 2017 was added using the same method. The new search identified 451 titles. The title review identified 85 studies, and the two most common reasons for exclusion were that the study was not an oncology study or was not randomized. Abstract review excluded further 24 studies; the most common reason was that placebo was not given as monotherapy but in combination with an oncology drug. The remaining 61 were included in a full paper review. The most common reason for exclusion at this point was that the study data had already been included in the database. Only 20 of these publications could be entered to the database. When entering the data, absolute numbers were converted to percentages when necessary. For synonyms such as neutropenia and neutrophil count, one term was selected, and all published values were entered under this term. The complete method description and the items listed in the Preferred Reporting Items for Systemic Reviews and Meta-Analyses (PRISMA) (11) are available from the corresponding author upon request, and include the search algorithm, the complete list of included articles, and the list of synonyms.
Diarrhea frequency and severity were typically not reported for all grades separated (grade 1, grade 2, 3, 4, 5), but instead as combining grades (grade 1 and 2, grade 3 and higher). This resulted in 11 distinct variables (columns of data): grades 0 to 5 as individual grades, grade 1 or higher as one number and equivalents up to 4 or higher, grade 1 and 2 combined, and grades 3 and 4 combined. However, the raw data had far more missing values than data entries. Some of the missing values could be logically deducted. For instance, when all of the grades 1-5 were provided individually, then grade 0 and all the combined grades (grade 1 or higher) could be calculated. This was implemented as an automated step in SPSS and named the “logical imputation”. It was further refined from what existed in the previous analyses (8, 9), now also taking into account the total number of AEs. If there were no AE-related deaths in the study, then diarrhea grade 5 = 0%. Nonetheless, similar to previous experiences, this step still left most values as missing data. Following the successful model in anemia, we also attempted a normal distribution model postulating an underlying quantitative diarrhea variable (9). However, the goodness of fit remained suboptimal, and eventually the outcome failed the final validation control. Instead, linear regression models were used to fill in missing variables. Correlation between the different variables were calculated, and those pairs of variables with R>0.8, and p10^(-10) (Pearson Correlation) were used for linear regression base models. Quality controls included a manual review of every single number for 1% of the lines; as well as automated steps controlling that the sums of variables were consistent. Discrepancies were evaluated including source data and review of how the imputation algorithms functioned in the given patient cohort. They included correctable discrepancies such as entry errors in the data entry, acceptable discrepancies such as summing errors as results of rounding, and uncorrectable discrepancies, when the original publication contained impossible values, such as sums of percentages adding to values other than 100%. Validation occurred by comparing aggregated AE frequencies with other variables, and comparing these findings between raw data and imputed data.
The influence of demographic variables was assessed in various ways as described previously (8, 9). In this, diarrhea was first described as a binary variable (yes/no) if any value higher than 0% was reported. Then quantitative diarrhea grades were combined as Grade 1 and higher, and expressed as % for each published patient cohort. At least two methods were used for each pair of variables. For quantitative variables, Pearson regression (% diarrhea versus demographic variable) and the SPSS algorithm “compare means” (quantitative variable among patient cohorts with or without diarrhea reported) were used. For categorical variables ANOVA and Chi-Square tests were used. Additionally, visual impression was used in scattered blots, histograms, and box blots. These exploratory analyses were performed for each of the four cut offs (grade 1 and higher, grade 2 and higher, etc.). All analyses and p-values were exploratory. Statistical analyses were done using SPSS, (Statistical Package for Social Studies, version 23.0 IBM, Armonk, NY, USA).
The 116 included publications described 127 placebo treated patient cohorts including 24,485 patients. The core of the analysis was built on the placebo monotherapy cohorts that provided diarrhea data. These included 88 publications reporting 95 patient cohorts with 18,493 patients. For the definition of benchmarks, studies with healthy volunteers and cancer prevention studies were excluded, resulting in 85 publications reporting 91 patient cohorts with 18,230 patients. The majority of the studies were phase 3 studies (68%) evaluating an oral drug (76%). The average median age was 57 years. Further details of the demographics are provided in Table I.
The most commonly recorded CTCAE diarrhea frequency among the 127 placebo monotherapy cohorts was grade 1 or higher (63 cohorts, average of reported values: 14.4%, range=0-41.5%). Grade 5 was the second most commonly reported grade (57 cohorts: all values among placebo arms = 0%). Grades 3 and 4 were reported as separate values in 39 cohorts each, followed by grade 1 and 2 combined , grade 3 and 4 combined or grade 3 or higher [25 each], grade 1 , grade 2 , grade 2 or higher , and grade 0 . Logical imputation to fill in missing values based upon other values reported for the same patient increased the number of data available for analysis to 96 cohort for grade 1 and higher. Regression model-based imputations finally provided between 81 (Grade 1 or 2) and 98 (grades 0 and grade 1 or higher) numeric values for analysis.
Correlation of various diarrhea grades, which were the basis for model imputations, were analyzed among the logically imputed values of both placebo and treatment arms. The various categories of grades correlated with each other in various degrees. The highest correlation was found between grade 3 and grade 3 combined with 4 (R=0.999, p=2.4×10–193, Pearson correlation), and the lowest correlation between grade 4 and grade 0 (R=–0.046, p=0.62). There were two groups of variables, separating grades 1 and 2 (group 1) from grades 3 and 4 (group 2). Correlations within each group (such as correlating grade 1 to 2) were typically characterized by R>0.5, and p0.01 (Figure 1), while correlations between the groups (such as correlating grade 1 to 3) were not (Figure 2). This allowed meaningful regression-based models only within each group. The correlations used for regression models were: Grade 3 to grade 3 or higher (R=0.999, p=8.6×10–184), grade 1 and 2 combined to grade 0 (R=–0.968, p=7.5×10–98), and grade 1 to grade 0 (R=–0.893, p=1.0×10–20). Scatter blots further allowed restricting the models to the more data ranges with higher predictability (Figure 1).
Figure 1. Frequency of diarrhea reporting in published randomized clinical cancer trials by grade. Open circles: treatment arms, filled circles: placebo arms. The reported frequency of diarrhea grades 1 and 2 correlated closely with the frequency of grade 0. Those were excluded from the regression-based model used for imputation.
Figure 2. Frequency of diarrhea randomized clinical trials. Open circles: treatment arms, filled circles: placebo arms. Although the correlation between grade 1 and 2 versus grade 3 and 4 was mathematically detectable (R=0.51), it was not close enough to be used for imputation of missing variables.
The influence of demographic covariates (Table I) on the reported frequency of diarrhea was evaluated for both, raw data and imputed data for all grades and reported combinations of grades. However, there was only one robust finding: subjects of placebo arms with healthy volunteers and cancer prevention studies reported lower frequencies of diarrhea. This was detectable in ANOVA for grade 1 and higher (p=0.011), and for grade 2 and higher (p=0.015); the phenomenon was also detectable in an equivalent analysis of the treatment arms. It led to the decision to exclude studies with health volunteers and cancer prevention studies from the final calculation of benchmarks. Some of the other variables also resulted in noteworthy observations, but none of them turned out to be robust, when analyzed in different ways, and therefore had no impact on the calculation of benchmarks. For instance, there were differences between the average diarrhea frequencies depending on the cancer diagnosis, however they did not reach p-values lower than 0.05 in any test. The phase of the study showed a notable trend which reached a marginal p-value of 0.019 in diarrhea grade 2 and higher: phase 1 average 0.9%, n=4, phase 2: 5.9%, n=24, and phase 3: 7.6%, n=57. However, this was not confirmed for the other grades; the Chi square test remained not significant, and the equivalent analysis among treatment arms showed the reverse trend. Therefore, the phenomenon was judged to be not a robust finding. ECOG stage 0 correlated marginally with diarrhea grade 1 or higher (Pearson R=0.286, p=0.019, n=64), however, this was not confirmed by ANOVA; in the scattered blots the trend was not visible and the equivalent analysis among treatment arms showed the opposite trend.
The relation between different AEs was evaluated in two ways: Considering the listing of an AE a binary variable with “yes” when any % reported was >0%, and correlating the quantitative frequencies with each other. Patient cohorts in which diarrhea was reported were more likely to also have other gastrointestinal AEs reported. For instance, among all study cohorts, placebo arms and treatment arms, 91% of cohorts in which no diarrhea was reported also had no constipation reported, compared to only 49% among cohorts with diarrhea reported (p=1.5×10–10, Chi Square test). The relation was also true when restricting the analysis to placebo arms (91 vs. 52%, p=0.0004); also, the quantitative comparison confirmed the finding. Equivalent observations were also made for the AEs abdominal pain and vomiting. Among AEs in other organ systems, the correlations were not as high as with gastrointestinal AEs. However, in most analyses, the binary Chi-square tests still resulted in exploratory p-values below 0.05, with the notable exceptions of febrile neutropenia, myalgia, hypersensitivity reactions, and hypokalemia where no relation was observed with diarrhea. In none of the analyzed pairs of AEs the opposite trend was observed: the absence of diarrhea was not linked to an increased frequency of any AE.
For the primary objective to describe baseline benchmarks of diarrhea frequency, the imputation based on regression models was used, and studies with healthy volunteers and cancer prevention studies were excluded. The following values were found among placebo arms: Grade 1 or higher: 15% (reported in 92 of 127 cohorts), grade 2 or higher 7.3% (reported in 79), grade 3 of higher 0.73% , grade 4 of higher 0.04% , and grade 5: 0% (120 cohorts). The equivalent values of the treatment arms were substantially higher (Table II).
The rate of treatment emergent diarrhea was reported in 92 of 127 placebo arms of randomized oncology trials; with an average of 15%, and a maximum of 56%. There was no report of grade 5 diarrhea in the placebo arms.
In the context of oncology early drug development, these values contribute to the assessment of causality. A variety of aspects are to be considered when assessing if an AE is caused by an experimental agent. Among these is information that is independent of the individual case, such as preclinical findings. Key elements of the individual case reviews include the time course of the event in relation to the drug application as well as alternative causes such as comorbidities and comedications. Aggregating data of multiple patients becomes valuable for common AEs (such as diarrhea), when more than ten patients have been treated. These aggregates allow analyzing new aspects such as dose effect relations. The total frequency may be compared to appropriate controls and allow inferences for causality. When control arms within the trial are missing, external information is necessary.
Based upon our data review, diarrhea is likely caused by the investigational agent, if the observed frequency exceeds the highest frequency published for a placebo arm:
a) Any case grade 5 diarrhea – exceeding 0%
b) Frequency of diarrhea grade 3 of higher: 6%
c) Frequency of grade 1 or higher: 56%
The findings of this analysis do not suggest using grade 2 or grade 4 diarrhea as benchmarks; these grades were infrequently reported, the differences between treatment arms and control arms were less prominent, and they did not add to the more commonly reported grade 1 and grade 3. Frequencies lower than the listed benchmarks, do not exclude drug causality. Of note, the authors selected the highest reported frequencies as benchmarks (Table II). Values below these frequencies, yet above the averages of 14% (grade 1 of higher) and 0.7% (grade 3 or higher) still indicate a possible causality by the drug and should still trigger appropriate assessments. In contrast, observed frequencies below the average of published placebo controls suggest the absence of a causal relationship.
A previous similar project focused on headache in AE reporting. It showed more influencing demographic variables and among those was a counterintuitive lower frequency of headache among the oldest age group and among patients with lower performance status (8). Diarrhea findings differed in that point. A relevant influence of demographic covariates could not be identified. This does not mean that there are no influential covariates. This analysis summarized data aggregates, a process that inherently results in loss of information, which is only visible in patient level data: A covariate (such as age) might have a significant influence on diarrhea within each patient cohort, but if all cohorts include patients with the same age, the median and average ages will be the same, and the age effect will disappear. However, the same phenomenon would also make the influence of the covariate irrelevant for the comparison of aggregated study data to benchmarks.
Diarrhea reporting frequency was related to most of the other reported AEs. Trials, that reported diarrhea frequencies higher than 0% were more likely to report also other AEs. The authors consider this observation an artifact of the diversity in reporting diligence rather than a biological phenomenon. The interpretation is supported by the absence of any AE with reverse relation (less frequently reported among cohorts with diarrhea). Even constipation was more commonly reported in studies with diarrhea. Also, the observation that for febrile neutropenia a similar relation to diarrhea was absent, might be explained by a reporting artifact: If a cancer patient is admitted for febrile neutropenia, the medical attention will focus on live support and antimicrobial treatment. Mild diarrhea in this context might not be reported, and the frequency described in published tables might be too low. However, for the purpose of comparing data aggregations, one needs to keep in mind that the same reporting weaknesses will also apply to the study data in evaluation, and therefore, the benchmarks should be created under the same conditions. It appears well advised to focus on the frequency of diarrhea among those studies that did report values >0%.
The method of filling in the blanks using a normal distribution model was successful when analyzing anemia, and failed when analyzing diarrhea. This is likely caused by the fact that diarrhea grades as defined by CTCAE are in fact not normally distributed. For instance, grade 2=0%, with Grade 1 and grade 3 >0 (12) cannot be described by a normal distribution, because in a normal distribution the frequency of grade 2 needs to be between the frequencies of grade 1 and grade 3. The experience shows that imputation methods need to be tested in each AE separately.
The findings reported here support the novel approach of utilizing external controls to interpret AE data of single arm studies by providing benchmarks for diarrhea frequencies. Three cut offs are recommended: grade 1 and higher, grade 3 and higher, and grade 5. A model to include covariates appears unnecessary.
BW is attending at Swedish Covenant hospital, and has no conflict of interest. JW is an employee of AbbVie pharmaceuticals Inc., and owns stocks. However, this project was not part of the employment, and the data interpretation reflect the authors personal opinion, not the company.
BW: Concept, writing, figures. JW: Concept, data collection, analysis.
The Authors would like to thank Elisa Cerri for the helpful discussions.