So last week a pretty interesting looking study appeared in the BMJ. With a title as ‘Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study‘ (and breathe…) I wouldn’t be too surprised if many people just skipped over it. Nevertheless, it has some pretty interesting results.
But first we’ll go on a journey back in time to 1997. That year, the BMJ dedicated an entire issue to the topic of meta-epidemiology. Specifically, it looked at meta-analyses, the branch of epidemiology that combines the results from all relevant studies to try to come to some form of agreement on a particular question. Meta-analyses are regarded as the highest form of evidence, being able to pool all available evidence into a final answer.
However, it turned out that this form of analysis wasn’t as infallible as some liked to believe. There was a problem we had been trying to ignore: publication bias. Studies with interesting results and large effect sizes were more likely to be published than studies that didn’t find anything. While these ‘negative trials’ gathered dust in researchers’ drawers, the people meta-analysing studies were lulled into thinking that the treatments they were evaluating were more effective than they actually were.
These results had a big impact on the way meta-analyses were viewed and performed, bringing publication bias and the importance of unpublished studies to the fore. This new study tries to shine a similar light on how we try to assess whether a new treatment works.
As the title of the study suggest, it’s looking at the difference between surrogate and final patient relevant outcomes. While patient relevant outcomes (such as, does this pill I’m taking for heart disease actually make me live longer, or does it lower my chance of a heart attack?) are what we’re really interested in, often trials will look at surrogate outcomes. For instance, while statins are prescribed to lower the chance of heart disease (which could require years of following very large groups of patients), trials often measure whether they lower cholesterol (which requires a couple of months) as we know this is related to future heart disease.
Looking at surrogate or intermediate outcomes makes trials shorter, smaller, and importantly, a lot cheaper. Instead of having to wait ten years to find out whether a drug has an effect, we can find out in a year. With the budget for health research getting ever smaller, it would be great if we could exchange patient relevant outcomes for equally valid surrogate outcomes. Whether that is possible is exactly what this new study is researching.
The researchers compared 84 trials using surrogate outcomes with 101 patient relevant trials published in six of the highest rates medical journals in 2005 and 2006. They found that trials using surrogate outcomes tend to find larger treatment effects: the drugs tested in these trials appeared to be about 47% more effective than trials using patient relevant outcomes. This was true over all the fields of epidemiological research they included, and couldn’t explained by any of the factors explored such as whether the size of the trial or whether it was funded by Big Pharma.
So why does this matter? Although trials using either type of outcome found different effect sizes, they still came to the same overall conclusion: either the drug worked or it didn’t. Other studies have found the same for other drugs that got licensed based on (mainly) data on surrogate outcomes. Unfortunately, the opposite has also happened. A drug for non-small cell lung cancer (a particular type of lung cancer), Gefitinib, was licensed by the FDA based on surrogate outcomes. When the data on patient relevant outcomes became available (whether the drug makes people live longer in this case), it turned out that it didn’t work.
As the paper concludes, policy makers and regulators should be cautious when the only data available on a new drug is on surrogate outcomes, as it could turn out that the drug they’re trying to evaluate is a lot less effective than the research seems to imply. And in rare cases, it might even not work at all.