
The cost of health care in the United States has grown from eight percent of the US economy in 1975 to 16 percent of GDP, and the Congressional budget office estimates that this growth in health care spending relative to the growth of the economy will continue to rise. [1]
“The methodological issues in discerning high quality research for observational CE raise different challenges than for evaluating randomized trials”
Physicians and public health officials are unsure if more expensive health services are actually more effective, either at all or enough so to justify the increase in cost. While clinicians and patients are interested in comparative effectiveness research (CER) to understand the relative benefit and risks of the real-world treatment decisions that are made every day, the driver for the current political interest in CER is more straightforward – it is hoped that somehow more effective care will ‘bend the curve’ of rising healthcare costs.
The American Recovery and Reinvestment Act of 2009 (ARRA) appropriated $1.1 billion for ‘comparative effectiveness’ research. As the spotlight on this type of research grows, it is increasingly apparent that comparative effectiveness (CE) is an evolving concept. ‘Effectiveness’ assumes a preference for real-world performance, as opposed to ‘efficacy’, which is primarily concerned with whether a treatment works under ideal conditions. For this reason, real-world research is critical to comparative effectiveness.
While real-world, head-to-head effectiveness trials are probably the gold standard in this new paradigm, it’s simply not conceivable that all of the myriad of questions that face clinicians and patients each day can be answered with large, expensive trials. As a result, observational comparative effectiveness research (OCER) is increasingly being looked to as a means to fill the gaps left by randomized clinical trials for making clinical and policy decisions. Also of interest are so-called ‘quasi-experimental’ designs, such as cluster randomized studies, where the level of randomization is at the practice, institution or region so as not to disturb the real-world activity at the level of the patient-clinician interaction.
Observational and quasi-experimental designs are very important to fill these evidence gaps for several reasons. First, it has been well shown that data from randomized clinical trials do not always reflect real-world practice and outcomes. Second is the issue of generalizability. In making coverage decisions, payers like Medicare are focused on the treatment benefit within the populations that they insure. Patients in randomized clinical trials are typically younger, healthier and less racially diverse than those in the real-world. Obtaining complementary data on other populations is critical to drawing inferences from the randomized trials results.
Third is an issue of applicability. Randomizing patients to one treatment or another removes the physician’s preferences and behavior from the study. While this simplifies analyses, it removes the most important factor in how medical care is actually delivered. The subtleties of treatment choices and the patients who receive them are intertwined with the real-world outcomes that result from medical products and services.
Fourth, there are far more questions to be answered than can be answered by large effectiveness trials. We must rely on other methods and sources such as observational studies. Recently, an IOM roundtable on evidence-based medicine quoted an AHRQ statement that “comparative effectiveness research typically will focus on realistic decisions confronting patients and their clinicians in actual practice… because of this focus on effectiveness as opposed to efficacy, these investigations will likely rely on both prospective trials and observational data to determine relative value in real-world settings.” [2]
Resistance
So, given this need, why is there so much resistance to observational comparative effectiveness research or confusion on how to utilize it? Essentially, designing and interpreting observational studies for CE research can be challenging since because of bias and confounding. Confounding is the presence of an unknown (and undetected) factor that is related to both the treatment and the outcome – and, as a result, certain aspects of the outcome might be erroneously attributed to the treatment (rather than to the confounding factor). But study designs and methodologies have vastly improved in their ability to control for confounding, thereby allowing observational studies to efficiently produce real-world data that are generalizable to a wide patient population.
Simply put, the methodological issues in discerning high quality research for observational comparative effectiveness raise different challenges than for evaluating randomized trials. Helping not only researchers, but also decision-makers to understand high quality OCER is critical to making such research useful and accepted. What we see across various stakeholders is a very high comfort level with randomized trials, and a far less appreciation for observational data, even as an adjunct in some situations. Regulatory authorities have traditionally not accepted observational studies for significant label changes; payers do not widely use observational data in formulary decisions and clinical guidelines relegate observational research to hardwired lower tiers of evidence.
Some stakeholders are starting to recognize the value. As certain groups become more enlightened on the merits and limitations of observational comparative effectiveness research, it is being used in these same scenarios. For example, observational data for comparative effectiveness is clearly utilized in guidelines development, particularly when the treatment effect is large. Warfarin in patients with mechanical heart valves is such an example. For regulatory examples, the FDA decision to extend the indication for intraocular lenses to older patients came directly as a result of registry data from the American Academy of Ophthalmology.
To summarize, the primary problems we see in using OCER data comes from a sense of great heterogeneity in the quality of these studies and an inability to discern those with greater or lesser risk of bias. Furthermore, the traditional evidence hierarchies, upon which the current generation of decision-makers largely trained, relegate observational study data to a lower tier of evidence and that has been slow to change.
What does this mean for the future of OCER? Will its value continue to be questioned, dismissed by stakeholders or displaced in evidence hierarchies? It seems that perceptions are changing. For example, the National Institute of Health and Clinical Excellence (NICE) in the United Kingdom’s Guidelines Manual on reviewing and grading evidence from 2007 puts the starting position of observational studies at Level 2, consistent with the traditional evidence hierarchies of the mid-90s.
Yet, Sir Michael Rawlins, who founded and leads NICE, has recently denounced the traditional evidence hierarchy and clearly change is ahead at NICE. Ultimately, the value of observational CE research will be measured by its impact in treatment, coverage, payment or other policy decisions and that will be the result of the information that is generated from CE research studies multiplied by the perceived value of that information.
Although there are many open questions, from a decision-makers’ perspective there are two key questions that are obtuse to those who are not expert in the field. First, what are the methodologies that result in good OCER? And, second, how can or should OCER that is considered to be ‘good’ be used – meaning, how valuable is that investment, from a value of information perspective, relative to other choices?
Quality
A quick review of existing and emerging guidelines puts this in perspective. The GRADE working group was initially published in 2004 as an effort to promote a more consistent and transparent approach to grading evidence and recommendations. GRADE focuses far more on what constitutes well-designed studies; the strength or weight of the findings and evidence that is fit for a purpose regardless of whether the study design is a randomized or an observational study.
The issue of what a decision-maker needs to look for in evaluating a single observational comparative effectiveness study remains less clear. What is needed is a listing of which methodologies are appropriate for specific types of questions and then how to discern good practices and valid results in studies that have applied appropriate methodologies.
The GRACE initiative (www.graceprinciples.org) is a relatively new initiative aimed at closing some of the gaps in using observational CE research. The initiative is developing good practice principles for the design, conduct, analysis and reporting of OCER. The primary goal of the project is to support the use of OCER by decision-makers, including payers, clinicians and patients.
For example, the first GRACE principle, “Identify evidence gaps and the potential value of an observational study,” asks whether the research question of focus can be answered through observational research. Within this principle, an evidence hierarchy is defined that enables decision makers to identify the situations that can provide the strongest types of evidence within an observational context, and then identifies other situations that can contribute useful information if paired with the right analytic tools. [3]
The GRACE Principles provide insights on how to design OCER studies to meet the needs of stakeholders and how to evaluate findings from a decision-makers’ perspective. But, they are not a methodology guide. They do not examine in detail, recommend, or discuss the limitations of using various methodologies for analysis of observational data (e.g. when to use propensity scoring and the limitations of it); instead GRACE just notes that many techniques are available and refers the reader to other sources. Other groups are now filling the gaps in identifying and assessing the most appropriate methodologies and techniques for specific types of questions facing decision-makers. Through these efforts, a new evidence hierarchy is slowly being weaved that will have far greater applicability to the range of studies to be used in comparative effectiveness research.
Highlighted by the ARRA legislation, comparative effectiveness research has been growing in importance for several years as a potential means for ‘bending’ the curve of rising health-care costs while maintaining or improving the quality of care delivered. Whether that is a realistic goal is yet to be seen. Nevertheless, comparative effectiveness research will become increasingly pervasive as funding grows dramatically. And, since observational study methods are key to understanding real-world outcomes of treatments, the role of observational comparative effectiveness research will also grow significantly.
With a coming avalanche of data, it will be more important than ever for decision-makers to discern good research, and this will be particularly true for observational comparative effectiveness research where the methodologies are more complicated and the risk of failing to account for confounding and bias are greater. It will become increasingly critical for consensus initiatives to identify good practices and for decision-makers to be trained to use such practices in evaluating and using real-world data.
Richard Gliklich, MD, is President of Outcome, a provider of patient registries, studies and technologies for evaluating real-world outcomes. Gliklich focuses on clinical research on the effectiveness, safety and quality of care. He was Principal Investigator and Senior Editor of the AHRQ handbook Registries for Evaluating Patient Outcomes: A User’s Guide, and also developed the American Heart Association’s Get With The Guidelines registries.
Christina DeFilippo Mack is Research Coordinator in the Scientific Affairs group for Outcome, a provider of patient registries, studies, and technologies for evaluating real-world outcomes. In her current role, Mack performs research for the FDA and the AHRQ and manages the program design, specification development, site enrolment and ongoing operations for industry-sponsored large-scale patient registries.
References
[1] The Long-Term Outlook for Health Care Spending. December 2007. http://www.cbo.gov/ftpdocs/87xx/doc8758/MainText.3.1.shtml#1068746.
[2] Clancy, CM. Making Comparative Effectiveness More than a Dream. 21 February 2008. http://www.instituteofmedicine.org/Object.File/Master/51/635/Clancy.pdf.
[3] Dreyer, NA on behalf of the GRACE Initiative. Do We Need Good Practice Principles for Observational Comparative Effectiveness Research? August 2008. www.graceprinciples.org.
