Improved Insights into Effects of Cancer Therapies Raymond DuBois, MD, M.D. Anderson Cancer CenterDonald Berry, PhD, M.D. Anderson Cancer CenterJim Doroshow, MD, FACP, National Cancer InstitutePaolo Paoletti, MD, Glaxo SmithKlineRichard Pazdur, MD, Food and Drug AdministrationNancy Roach, C3: Colorectal Cancer CoalitionThe need for clarity on efficacy endpoints
In an effort to accelerate safe and effective cancer drug development and to decrease the time todrug approval, the oncology community has long sought endpoints other than overall survival (OS) toevaluate new agents. Measures of disease progression, health-related quality of life, patient-reportedsymptoms, and biomarkers have been proposed and tested in clinical studies, but consensus hasnot been reached on the role of these endpoints in determining the overall clinical benefit of atherapy. One auxiliary endpoint that has been the focus of particularly intense discussion isprogression-free survival (PFS), which employs the RECIST criteria to determine the progression ofcancer based on imaging.1 PFS is the length of time during and after treatment in which a patient isliving with a disease that does not worsen, according to the established criteria. RECIST definesprogression as an increase of 20 percent in a single dimension on computed tomogram or magneticresonance imaging. However, concern about potential biases in the measurement of diseaseprogression by radiographic imaging has resulted in debate over the current use of PFS in clinicaltrials.
Two recent cancer drug approvals were based solely on evidence of PFS, underscoring the need toreach agreement on how this endpoint is defined and interpreted. In December 2005, the U.S. Foodand Drug Administration (FDA) approved sorafenib for the treatment of advanced kidney cancerbased on an increase in progression-free survival, despite the absence of a statistically significantbenefit in overall survival. More recently, the FDA granted accelerated approval for bevacizumab inpatients with breast cancer. Data showed that a combination of bevacizumab and paclitaxel nearlydoubled median PFS compared with paclitaxel alone, but the secondary endpoint of overall survivalin this trial did not reach statistical significance. A review of the data by the FDA will be required forthe accelerated approval to be converted into a full approval by the end of 2008. If upcoming trialsdo not show a survival benefit, then the accelerated approval for breast cancer could be revoked orcurtailed until more data are collected.
Lack of clarity around the appropriate use of PFS and other auxiliary endpoints can be a barrier toefficient clinical cancer research, as well as to the review and approval of cancer therapies. Thispaper will brief conference attendees on the complex issues surrounding measurement of treatmentefficacy and propose a set of principles to guide the evaluation and use of auxiliary endpoints. Finally, these principles will be illustrated using PFS as a case study. Issues around the use of auxiliary endpoints
The gold standard for clinical effectiveness of a given agent is an improvement in a defined endpointin a randomized clinical trial.2,3 In oncology, overall survival is most often seen as the best singleagreed-upon endpoint. However, randomization rarely occurs in the Phase II setting, as is commonin other areas of drug development. Instead, Phase II trials measure the rate of complete and partial
response to given agents prior to progression to randomized Phase III trials. Single-arm, historically-controlled Phase II trials are rarely employed outside of oncology. This issue is multifactorial, but thismay contribute to the high failure rate of drugs proceeding from the Phase II to the Phase III setting incancer drug development. Thus, there is a need for endpoints that can quickly detect drug efficacyor failure, in order to avoid unnecessary resource allocation to drugs that will ultimately fail to exhibit apatient benefit.
Overall survival is objectively measured and not prone to the potential investigator biases associatedwith endpoints that require clinical judgment. However, using overall survival as a primary endpointsignificantly slows the rate of cancer drug development. As approved therapies have becomeincreasingly effective at prolonging survival, so too have they prolonged the duration of trials designedto detect that endpoint. Delays in recruitment and follow-up ultimately serve to prolong the regulatoryreview and approval of newer agents that could provide needed options for cancer patients. Furthermore, overall survival is a crude instrument for measuring the effects of many targetedtherapies, which may be designed to work in a subset of patients with specific molecular targets. Thus, a great need exists to define and validate alternative markers of effect.4
While the term “surrogate endpoint” has been more commonly used in the literature, “auxiliaryendpoint” is a preferable term because the endpoints under investigation are not meant to supplantmore conventional endpoints, but rather to be evaluated in conjunction with other endpoints. Wedefine auxiliary endpoints to include the collection of endpoints – other than overall survival – 1usedto infer the effects of cancer therapies from clinical trials. Auxiliary endpoints may be primary orsecondary endpoints within a trial, and may include progression-free survival (time to progression),response rate, patient-reported outcomes (e.g., quality of life), and biomarkers (e.g., tumor size,circulating tumor cells, and tumor-specific markers). Clearly defining the strengths, limitations, andappropriate uses of auxiliary endpoints could accelerate the development, review, and approval ofnew treatments. Principles for the evaluation and use of auxiliary endpoints
We propose three basic principles to consider when selecting auxiliary endpoints for a given trial. First, a strong biological rationale should support the potential auxiliary endpoint as a marker oftreatment effectiveness. For example, biomarkers that predict variability in survival time may bepreferred endpoint candidates. Second, the potential auxiliary endpoint should be shown to explainvariability in treatment outcomes in terms of survival for treated patients.5 Third, ideal auxiliaryendpoints should accurately assess the efficacy of the drug being evaluated with minimal risk ofsubjectivity or bias. Where the possibility of bias exists, the trial design should compensate byseeking to minimize potential bias.6
The development of new drugs for AIDS patients closely follows this model of auxiliary endpointdevelopment. CD4 count and viral load were validated as auxiliary endpoints in trials in the late1980s and early 1990s, allowing for an explosion in the available therapies for AIDS patients. Applying this model to the oncology community will be more difficult. Since cancer involves manyheterogeneous disease processes, many auxiliary endpoints will need to be developed according tothese criteria. Toward a rational and valid process for evaluating progression-free survival
Progression-free survival (PFS) is a desired endpoint in many settings, but it is not a surrogate foroverall survival. Advantages of PFS as a primary endpoint include a more rapid clinical trial and theelimination of confounding effects when evaluating experimental therapies in diseases with existingeffective therapies. For example, if a patient enters a clinical trial after four failed conventionaltherapies and later discontinues that trial due to progression, numerous other approved therapies
may be available. The patient could survive long after a given trial ceased accruing patients, andother therapies could contribute to his or her demise, minimizing the effect of the experimentaltherapy. Further, non-trivial improvements in PFS are considered a clinical benefit in some settings. Patients may see a benefit in a lack of progression in their tumor burden, irrespective of thebenefits in overall survival.
Although PFS has many advantages, it is not without limitations. Unlike OS, the precise timing ofPFS is not known. This leads to the potential for evaluation-time bias, which produces biasedestimates of treatment effectiveness when the evaluation times for progression status differ bytreatment arm.7 Further, elements of subjectivity remain in spite of efforts, such as RECIST, tostandardize the evaluation of progressive disease.8 Indeed, a non-trivial number of discrepanciesbetween radiologists evaluating the progression status of the same patients are to be expected. These discrepancies can come from multiple sources. At the start of trials, baseline lesions areusually defined, but occasionally, these lesions are altered or ignored in the course of a trial. Radiographic scans can be misplaced, leading to clinical judgments based on varied amounts ofradiographic information. In addition, radiologists may have different interpretations of the availablescans. In a recent trial, the discrepancy rate between two expert radiologists blinded to thetreatment assignment was 34 percent. When these discrepancies are unrelated to treatment, theyare a source of measurement variability, which results in attenuated estimates of effect sizes. Measurement variability reduces the power to detect a true difference but will not lead to invalidconclusions when the experimental therapy is truly ineffective. In other words, measurementvariability alone will not result in ineffective therapies entering the oncology community. However, ifthe variability is large enough, it could preclude effective therapies from being revealed.
The most significant concern about discrepancies in assessment arises when progressive diseaseevaluations are influenced by an investigator’s lack of objectivity about the therapies under study. The potential for evaluations to be influenced by knowledge of treatment assignment, combinedwith pre-existing views about their relative effectiveness, has led to the introduction of BlindedIndependent Central Reviews (BICR) as a suggested means of validating efficacy in trials with PFSendpoints. However, the use of BICR is problematic and may lead to invalid analyses, as it doesnot always provide an unbiased estimate of a treatment’s effectiveness. Specifically, BICR analysesfor PFS are likely subject to the presence of informative censoring, which invalidates standardanalyses. The methodology relies on the assumption that censoring is independent of factorsassociated with progression or survival.
Informative censoring arises in the following manner: Patients who progress by investigatorassessment may not have the same assessed time of progression under the BICR. Once a patienthas progressed according to the investigator, he or she will be taken “off protocol” and furtherfollow-up is not likely. If the BICR does not determine that a patient has progressed by the time thepatient is off protocol, the patient is censored for the purpose of analysis. This patient, however, ismore likely to have progressed, as assessed by BICR, sooner than those remaining in the at-riskcohort. This violates the standard assumptions for censoring subjects and, as a result, survival-analysis estimates are biased. Further, although methods for modeling informative censoring exist,these methods cannot conclusively eliminate the potential effects of informative censoring. Dodd etal. provide a more detailed discussion of this issue with an example from a clinical trial.9
In a review of Phase III oncology trials published in the last five years that had BICRs as acomponent of assessment, no cases were shown to have substantial differences betweenanalyses between the BICR and investigator assessments.10 (See Tables 1 and 2.) The lack ofdifferences is striking in light of the seemingly high discrepancy rates between BICR and localreview, which range from 36 to 53 percent. However, these discrepancies are likely due in largepart to random, rather than systematic, differences between the clinicians who evaluate theradiographic imaging scans. This variation in assessment between two independent reviewers is a
well-studied phenomenon in many therapeutic areas.11 Further, there was no trend that wouldindicate that either BICR or local review resulted in a stronger treatment effect.
In conclusion, BICR does not necessarily provide a less biased estimate of a treatment’seffectiveness than local review, and situations in which the BICR conclusions differ from those basedon the investigators’ assessments result in an ambiguous situation. The discrepancy may be causedby measurement variability, informative censoring, or true evaluation bias. Methods that effectivelyreduce evaluation bias where it is most likely to affect trial outcomes are needed. Four approachesare worthy of consideration.
Matter for clarification: No BICR when trials are double-blinded.
Blinding of treatment assignment would eliminate systematic bias in PFS evaluation related toknowledge of treatment assignment. Therefore, there should be no requirement for central review indouble-blinded trials, except in the case where an extreme imbalance between treatment arms in theincidence of side effects could lead to a considerable level of unblinding. This level of imbalancewould be characterized by the majority of patients in the treatment arm experiencing a particular sideeffect with a virtual absence of this same side effect in the control arm.
Case 1: An open-label superiority trial with an BICR-based audit of progression.
Detection of meaningful evaluation bias will be gauged via an audit of progression determinations. BICR could be performed in both arms of a trial on a subset of cases. A sample size for the auditwould be specified in advance (for example, 10 percent of participants or a minimum number ofcases). If bias is suspected, then the audit would expand to a larger proportion of cases. The goal ofthe audit would be to determine whether there is a meaningful difference in hazard ratios betweenthe local review and BICR. It is recognized that, given the potential biases present with both BICRand local review, a discrepancy in assessments would make a conclusion about a treatment’sefficacy more difficult.
Large effect sizes will likely be robust to small discrepancies between treatment arms, while smallereffect sizes will be quite sensitive to small discrepancies. Therefore, when effect sizes are large,relatively smaller audits may be necessary to detect the amount of bias needed to alter the trialconclusion substantively. However, in some cases, no audit, no matter its size, can rule outevaluation bias.
Because the goal of the audit is to detect actual bias, measurement variability should be controlled. Technologies that enable synchronization, allowing patients to be followed by the BICR as a trial isongoing, are strongly encouraged.
It is recognized that data-driven analyses are necessary to develop the scientific justification for theselection of the recommended audit size. The Phamaceutical Researchers and Manufacturers ofAmerica (PhRMA) and the National Cancer Institute (NCI) have begun research projects to addressthis specific issue. The NCI will be collecting patient-level data from multiple large clinical trials withdata from both central and local reviews to better inform the audit process. Since it not expected thatthese data sets will contain meaningful bias, such bias will be introduced into the data so that theauditing strategy can be tested. Simulation studies, based on an understanding of the trial data, willalso inform recommendations. Clearly, an understanding of what is an important level of bias for aparticular study given an observed effect size is needed.
Case 2: An open-label superiority trial with large effect size.
When treatment effect sizes are large enough, an audit is not necessary, since evaluation bias is notexpected to be of a magnitude that would meaningfully impact the observed effect size. As part ofthis proposal, increased monitoring of the protocol-specified imaging procedures at the local sitecould be undertaken. It is expected that the investigator is the greatest potential source for bias in aPFS assessment. It should also be noted that a local radiologist is frequently unaware that patientsare participating in a clinical trial, so a procedure that records the measurements or progressionassessments of both the radiologist and the investigator is recommended. Whenever the investigatoroverrides the radiologist’s determination, the reasons for this will be documented. When this occursmore frequently in one treatment arm and the reasons are not easy to verify objectively, concernabout bias will arise.
Case 3: PFS evaluation at two time points with auditing at these evaluation times.
Evaluation of treatment effectiveness could be based on the proportion of patients whose cancer hasprogressed at two time points, rather than using an analysis based on a survival model. Two timepoints for imaging assessments would be determined prospectively, corresponding to theapproximate median PFS and approximately twice median PFS of the control arm or conventionaltherapies. Summary statistics would include the proportion alive and progression-free at each timepoint. Progressions that have been documented prior to the designated imaging assessment timewould be counted as an event for the rate of progression or death, and images would be audited atthe two time points. For patients who progress prior to the designated scan times, the audit wouldbe based on the scan that determined progressive disease.
This two-point approach reduces evaluation-time bias and results in a simpler trial design.12 Sincethe approach limits the focus to the two imaging assessments, the issues of compliance, timing,rigor, and consistency are easier to maintain or verify. Further, central review of two time pointsshould be easier to implement. While one might have concerns about a loss in power of the trialdesign as compared to a log-rank analysis, the loss in power with two time points is less than thatfrom a single time point. Indeed, Freidlin et al. demonstrate that there is little risk in major power lossfrom this approach.13 The trade-off for some loss in power, however, is decreased susceptibility tobias. Conclusion
This paper has presented a proposal for auditing PFS in three different scenarios. Establishing suchauditing procedures can help build confidence in PFS as an indicator of clinical benefit. Thesuggestions listed above also hint at a way forward for improving the reliability of informationproduced by other auxiliary endpoints. If the cancer research community can determine how to mosteffectively utilize auxiliary endpoints – without compromising the quality of safety and efficacy data –cancer patients will benefit greatly. Table 1. Trials that have used retrospective blinded independent central reviewsa Sample size Hazard ratio and Hazard ratio and 95% confidence 95% confidence interval per interval per local central review
best supportivecare vs. bestsupportive care16
Notes for Table 1
a We reviewed the literature and searched PubMed for studies in breast cancer, colorectal cancer, lung cancer and renal cellcarcinoma. Search terms included, “progression free survival” or “time to progression,” with filters of “randomized controlledtrial” and “published in last five years.” This revealed 209 manuscripts, of which only six reported having a central review ofprogression. The bevacizumab plus paclitaxel trial in breast cancer (last row) was included separately because it generatedmuch discussion during an FDA Oncologic Drug Advisory Committee meeting on Dec. 5, 2007. All of these trialsimplemented a retrospective BICR. The panitumimab trial allowed cross-over at the time of locally determined progressionamongst patients receiving the control treatment. As a result, patients for whom progression was not confirmed centrallycontinued to be evaluated centrally for progression.
c Hazard ratios not reported for local review.
d Difference statistically significant (p<0.0011). 95% CI for median PFS not reported. Table 2. Discrepancy rates for three trials with central review Discrepancy Discrepancy Per Central Per Local Review Assignment of Assignment Progression/ of PFS Status Censoring Datea Notes for Table 2
a Computed as agreement in date of progression or date of censoring.
b Estimated amongst the 649 (of 722) patients for whom images were available for central review. An agreement was countedif dates were within 6 weeks of one another. This is in contrast to the lapatinib plus capecitabine and bevacizumab pluscapecitabine trials, in which exact date was used for agreement.
c A discrepancy was counted if either status assignment differed or if no image was available for central review. As a result, atotal of 722 (and not 649) patients were included. References
Therasse, P. et al. New Guidelines to Evaluate the Response to Treatment in Solid Tumors. J. Natl. Cancer Inst. 92, 205-216(2000).
Freidlin, B. et al. Proposal for the Use of Progression-Free Survival in Unblinded Randomized Trials. J Clin Oncol. 25,2122-2126 (2007).
Ratain, M.J. et al. Recommended changes to oncology clinical trial design: Revolution or evolution? Eur. J. Cancer. 44,8-11 (2008).
Schilsky, R.L. End Points in Cancer Clinical Trials and the Drug Approval Process. Clin. Cancer Res. 8, 935-938 (2002).
Ellenberg, S.S. Surrogate end points in clinical trials. BMJ. 302, 63-64 (1991).
Pazdur, R. Endpoints for Assessing Drug Activity in Clinical Trials. Oncologist 13 suppl 2, 19-21 (2008).
Dodd, L.E., et al. Blinded Independent Central Review of Progression-Free Survival in Phase III Clinical Trials: ImportantDesign Element or Unnecessary Expense? J. Clin. Oncol. 26, 3791-3796 (2008).
11 Feinstein, A.R. A bibliography of publications on observer variability. J. Chronic Disease. 38, 619-632 (1985).
14 Escudier, B. et al. Sorafenib in advanced clear-cell renal-cell carcinoma. N. Engl. J. Med. 356, 125-134 (2007).
15 Motzer, R.J. et al. Sunitinib versus interferon alfa in metastatic renal-cell carcinoma. N. Engl. J. Med. 356, 115-124 (2007).
16 Van Cutsem, E. et al. Open-Label Phase III Trial of Panitumumab Plus Best Supportive Care Compared with Best
Supportive Care Alone in Patients With Chemotherapy-Refractory Metastatic Colorectal Cancer. J. Clin. Oncol. 25,1658-1664 (2007).
17 Geyer, C.E. et al. Lapatinib plus capecitabine for HER2-positive advanced breast cancer. N. Engl. J. Med. 355, 2733-2743
18 Miller, K.D. et al., Randomized Phase III Trial of Capecitabine Compared with Bevacizumab plus Capecitabine in Patients
with Previously Treated Metastatic breast Cancer. J. Clin. Oncol. 23, 792-799 (2005).
19 Thomas, E.S. et al. Ixabepilone Plus Capecitabine for Metastatic Breast Cancer Progressing after Anthracycline and Taxane
Treatment. J. Clin. Oncol. 25, 5210-5217.
20 U.S. Food and Drug Administration. FDA Briefing Document: Oncology Drug Advisory Committee Meeting BLA STN125085/91.018 Avastin® (bevacizumab). (2007).
21 Genentech, Oncology Drugs Advisory Committee Meeting: 5 December 2007.
23 U.S. Food and Drug Administration, 2007.
24 Genentech, Oncology Drugs Advisory Committee Meeting: 5 December 2007.
25 Personal communication. Suman Bhattacharya, PhD, Bio-oncology, Genentech.
27 Personal communication, Suman Bhattacharya.
28 Personal communication, Suman Bhattacharya.
29 Personal communication, Suman Bhattacharya.
SUMMARY OF PRODUCT CHARACTERISTICS NAME OF THE MEDICINAL PRODUCT QUALITATIVE AND QUANTITATIVE COMPOSITION Each soft gelatin capsule contains 1mg phytomenadione. For a full list of excipients, see section 6.1. PHARMACEUTICAL FORM Capsule, soft The dark brown soft capsule contains a clear, odourless pale yellow liquid. CLINICAL PARTICULARS Therapeutic indications Neokay i
Members of the American Association of Equine Practitioners 201 Taylor Avenue, Gordonsvil e, VA 22942 540-832-3030 Greg R. Schmidt, DVM Mark H. Foley, DVM Rebecca W. Kramer, DVM Martha A. Mellish, DVM Variations in each horse’s immune system and management situation should be reflected in its individ-ual de-worming program. Many horses, especially if kept at an appropriate numbe