David Kessler, who was FDA commissioner under both presidents George H. W. Bush and Bill Clinton, confidently stated, “When the industry sells a drug, the drug works, and it does what it says on the label. Take that away and we go back to snake oil.”
Not quite. Does helping 10 percent of patients show that “the drug works”?
This is from David R. Henderson and Charles L. Hooper, “Standing Between You and COVID-19 Relief: The FDA,” Goodman Institute, Brief Analysis No. 137, April 20, 2020.
Another excerpt:
Merck’s Keytruda has recently become one of the hottest drugs on the market. It received favorable press coverage when Jimmy Carter reported that he was cancer free after therapy with Keytruda. But Carter was lucky. In one clinical trial, Keytruda destroyed or reduced the tumors in only 34 percent of patients. Keytruda—which was the fourth biggest-selling drug globally in 2018 and brought in worldwide revenues of $11.1 billion last year—is far from a sure thing.
Read the whole thing.
READER COMMENTS
Dylan
May 6 2020 at 6:33am
David,
Thanks for sharing this piece, which has the length to explore some of these topics in greater depth than I’ve seen from the shorter WSJ columns and the like. Unfortunately, I still don’t get much of a sense of how you think efficacy will be shown in your ideal world. From the section on off-label use, a lay reader might get the idea that doctors prescribing something off-label are the ones that show that something works, as opposed to giving some anecdotal evidence that is then followed up with by a clinical trial to see if it holds up. Unfortunately, giving a person a drug and then having their disease get better is very little evidence that the drug worked, even if only on that patient. You don’t know if the patient would have recovered on their own, or if something else in their environment contributed, all you have is one factor for one patient that doesn’t tell you much of anything at all.
The case with hydroxychloroquine is a timely example. The initial reports out of France were highly promising, but anecdotal and people that knew a little bit about drug discovery urged caution until we had a properly controlled trial to compare to, and sure enough the trials that have come out since then have mostly shown no benefit or in some cases a signal of negative benefit. An acquaintance of mine that died of COVID-19 may have had this treatment as a contributing factor (he died of a heart failure the day after being released from the hospital).
I’d also like to point out that the section on Intermune and off-label promotion is a bit misleading. There is more detail here. Short version though is that the sub-group analysis was not specified before the trial began, and when you go looking for statistically significant results after doing the trial, you’re bound to find something that looks highly significant, even though it is just by chance. Any economist that has spent time looking at econometric models I think will be well accustomed to the practice. The CEO wrote the PR and the headline was “InterMune Announces Phase III Data Demonstrating Survival Benefit of Actimmune in IPF” with the subheading “Reduces Mortality by 70% in Patients with Mild to Moderate Disease.” And no mention in the press release that this was post-hoc analysis, not a secondary endpoint. Plenty of biotechs do almost the same thing after failing a clinical trial without getting into any kind of trouble, I’ve worked with at least one where we advised them on just this approach, the difference is that they clearly state in the press release that this is a post-hoc subgroup analysis. Also, would have been helpful to mention that Intermune did conduct a second Phase III trial in mild to moderate patients, and found no benefit (trend was actually to negative benefit, as more patients died with the treatment than without). I’d like to know if you think the company would have run the second trial, if the drug was already on the market and they could have just promoted it off-label using the sub-group analysis?
Mark Z
May 6 2020 at 6:49am
This is a really good point that heterogeneity of response to a drug is a really big issue with clinical trials, and it is of course not a speculative exercise, it’s very common for a drug to work very well but only for a subset of patients. The prevailing model for drug approval seems to be: develop a new drug and see if it beats the (often one size fits all) standard of care. If so, it gets approved, otherwise, it’s probably back to square 1. If a drug only outperforms the standard of care 20% of the time though, it has little chance of getting through the process, and since clinical trials often have such small sample sizes that one wouldn’t be able to identify if the drug works for a subset with any degree of confidence. In general, many more samples are needed than clinical trials usually have to confidently assess whether a drug works. Biology isn’t like physics or even chemistry, the data is so much noisier, with so many more unknown confounding variables, that standard power analysis for estimating necessary sample size will usually underestimate how many samples you need. But there’s also a strong desire from ethical considerations to minimize the number of people (or mice) you expose ‘gratuitously’ to an unproven drug, which also puts downward pressure on sample sizes.
I’ll end my rant; the bottom line though is that to prevent drugs that work better than the standard for only a subset of patients from slipping through the cracks, we need a much bigger N, which likely means letting it be marketed to customers, or finding a way to recruit several times as many people on average for clinical trials.
Dylan
May 6 2020 at 8:23am
I agree that heterogeneity of response is a big issue, but I don’t know that a larger N really gets you very far. Imagine a drug that completely cures 1% of the patient population and kills 2%, and doesn’t do much of anything for the rest. If we’ve got no idea which patients will respond which way before dosing them, then giving it to everyone seems like a bad idea. If we’ve got an idea that people with X genetic marker are more likely to be helped, then we’ve got a good basis for doing a clinical trial, and even with a smallish N, if the population is well defined we can get useful information.
David Henderson
May 6 2020 at 8:33am
Dylan,
That’s a separate question. Our study is about getting rid of the efficacy requirement, not the safety requirement.
Dylan
May 6 2020 at 9:12am
I understand that, but the two are very much linked, since safety can only really be assessed in relationship to efficacy. That’s different than a Ph I “safety” trial, which only shows that the drug doesn’t have immediate and show stopping adverse effects.
Nevertheless, the point I was making to Mark doesn’t really rely on the safety portion to be relevant. It’s about how do we handle a drug that may help a small fraction of patients, but we don’t have a way to tell who those patients are beforehand? If you give a drug to 1000 patients, and 10 of them recover, and the rest don’t, you still have no idea if it was the drug that helped them, or if those 10 people would have just recovered on their own. That’s not an issue that can be solved with a larger N by itself, you need some kind of control to see what would have happened to similar patients that didn’t get the treatment.
P.S. I have written a longer direct response to your post, that includes further background on Intermune, since I think the information as presented in your article is somewhat misleading. Unfortunately that reply, and a edited version of it have both ended up in moderation.
Mark Z
May 6 2020 at 6:41pm
But if it kills 2% the people who get it, it’s still likely to get past current small-sample-size clinical trials for much the same reason a drug would likely fail if it only cures a small fraction. But I guess if your prior is that a given drug being tested is more likely to severely harm or kill a given patient than cure them, the fact that fewer drugs approved in general (even if officially for efficacy reasons rather than safety) is a good thing. But I think even then, the benefits of a surprise cure endure longer than the costs of a surprise poison. If we test a drug on 10x as many people, we kill 10x as many people during the trial than we otherwise would have, and the drug is not approved. But we don’t merely cure 10x as many people in the trial if a drug turns out to be a cure and gets approved, we cure >>10x as many people as now the drug will continue to cure people from now on (or until a new better drug comes along). So, the kill/cure ratio would have to be significantly higher than 1 (assuming killing and curing are equivalent in magnitude) to for it not to be worth it.
And I’m assuming bringing drugs to market would work similarly: once it’s discovered that a drug is likely to kill those that take it, people will generally stop taking it, approved or no. I think the asymmetry between positive and negative discoveries is important though: we benefit from a positive discovery from then on, but we don’t have to keep taking a newly discovered poison from now on once discovered.
Dylan
May 7 2020 at 6:58am
I do believe that most investigational drugs would do more harm than good, which is why we have a 90% failure rate in the clinic. Most drugs that we look at don’t seem to work, but they do seem to cause some adverse events in some patients.
But, really, the harm is just peripheral to the point I was trying to make. Let me try with a real world example from a biotech that I’m doing some consulting for. It has a preclinical drug in development that, with funding, will enter PhI safety trials at the end of the year. There is preliminary data and theoretical backing that the drug can be used across a wide range of conditions, everything from Parkinson’s to cancer, to sickle cell disease. The reality is that is is unlikely to make it through the process and to the market for any of these indications, chances of success are just too low.
But, let’s say you’re right, and that the reason this drug won’t make it through clinical trials is that they are underpowered for the effect size, and that there is a small (but unknown) subset of patients this helps. What’s a doctor supposed to do with some preclinical animal data and a Ph I safety trial? Should the doctor start giving this drug to every patient that has any of the diseases that this is theoretically good for? What about the hundreds of other investigational drugs that will be on the market for the same indications?
And, assuming that small subset of patients it helps is real, how will we ever figure that out?
In your first post you say:
Which is something I absolutely agree with, but to me, this seems like the best argument possible for controlled clinical trials. If data is noisy in this noisy in the clinic, what is it like in the real world with no control arm? How do you distinguish between the patient that got better because of the drug vs. the one that got better on his own? How do you take the variation across how every doctor will treat a patient, and standardize it so that we can get meaningful information out of that large N?
Take hydroxychloroquine. How many patients have been given this drug for COVID-19 now? How many were dosed in combination with azithromycin? How many with zinc? How many in some other combination? Or with different dosing? I’m not against that kind of trial and error approach, but at best, that gives us an idea of something that we can test in a controlled trial. The anecdotal evidence on hydroxychloroquine looked good, sometimes very good. But the controlled clinical trials that followed have mostly shown no benefit, or a slight tilt towards worse outcomes with the drug. It’s true that these have all been small trials, and larger trials might find something different. But the anecdotal evidence is even less robust.
Mark Z
May 7 2020 at 2:43pm
Well for diseases for which no effective treatment exists, giving the drug out liberally and saying ‘it probably won’t work, but it might’ and recording the results may not be a bad idea. For cases where there’s already a working drug, but the proposed one might be better for some people, if the condition isn’t too severe in the short run and the effects of the drug are observable fairly quickly, then one could offer it as a cheaper (or even free) alternative to he standard drug temporarily. It’s also not unheard of for drugs to have some additive effect, so giving patients the standard drug + the proposed drug and seeing if any do better than with just the standard, it may be sufficient to try just the proposed drug for some patients, or more cautiously, gradually reduce the dosage of the standard drug and increase the dosage of the proposed drug.
I guess a key disagreement I have here is that I don’t think clinical trials have much of an advantage in randomization. Suppose doctors reported treatment results to a database, and a drug company randomly selected doctors to invite to offer a proposed drug to patients in a manner like I suggested above, and some patients accept and some decline, there are of course likely selection biases in which doctors choose to participate and which patients accept the offer, but there are also biases in which patients are eligible for a clinical trial, and biases in which patients among those choose to participate. It doesn’t seem clear to me that trials are going to be better at random sampling than my hypothetical alternative ‘market experimentation.’ The stringency of clinical trials tends to force them to cast a much smaller net, which I think exacerbates biases, due to the extent of nonrandom heterogeneity that pervades human populations.
An example that might illustrate my statistical concern: if I were to take the ethnic composition of a bunch of neighborhoods of a few hundred people, the composition of each neighborhood would tend to deviate dramatically from the US population in general, far more so than we would expect from randomly sampling a few hundred people from a ‘homogenized’ mixed population. If we were to instead look at the composition of cities, we’d see markedly less deviation, because there’s less structural between-city variation than between-neighborhood, and even less if we look at state level composition. The bottom line: casting a ‘broader net’ doesn’t just reduce noisiness, also reduces susceptibility to such nonrandom variation. Geography may not be the most important issue, but the fact that, to my knowledge (I googled it and couldn’t find a percentage), most clinical trials are still conducted at a single medical center renders them very vulnerable geography-related biases. Renowned medical centers that draw medical tourists from around the country may escape this to some extent, but then patients able and willing to travel far to a renowned medical center may not be a much better random sample. So, yeah, it’s not just a ‘small N’ problem in the traditional sense of having few samples from a distribution; the constraints on sample sizes in clinical trials also render them less likely to be approximately random samples at all.
And I may not be up on the latest news, but from my understanding, hydroxychloriquine trials have all (correct me if I’m wrong about this) been with patients with severe illness, whereas informed proponents HCQ have been arguing that it would be useful if taken early to prevent the onset of severe disease, but that once one is already severely ill, it’s too late. I don’t know if this explains the discrepancy between the trials and observational studies, maybe it doesn’t work in any case, but I suspect there’s a reluctance to run trials on people who are mildly symptomatic or asymptomatic because of ‘ethical considerations’ about giving the drug to anyone who doesn’t already have nothing to lose. I think that’s a limitation of clinical trials, under current norms at least, in its own right. It’s probably apparent by now that I think people should be more free to voluntarily let themselves be ‘guinea pigs’ for medical science, particularly when the risk is modest.
Dylan
May 7 2020 at 4:48pm
Thanks for the detailed and thoughtful response Mark. I should state upfront that I’m open to alternatives to the FDA monopoly on efficacy testing. My beef with proposals like this one, is the lack of detail on what replaces it, and my feeling that the incentives wouldn’t line up for the market to really test drugs, and any testing would be done with an eye towards marketing as opposed to really understanding if the drug works.
Your idea of a database that would keep the details of treatments for experimental new drugs is certainly a step in the right direction. Although I’ll note that even that isn’t trivial to implement. EMRs across hospitals and medical offices are different, and still mostly incompatible with each other. The information collected on patients will be different from each other, and in general not nearly as complete as what you currently get in a clinical trial, treatment protocols won’t be anywhere close to randomly distributed (think one doctor at a hospital reports good results with one combo, so everyone else in the hospital picks up the same treatment, while in the poorer hospital the next town over, they adopt a different combo or dosing regimen or what have you). Honestly, I’ve no idea if those kinds of difficulties can be overcome with a big data approach at all, but it seems like for you to have any hope of doing that you’re going to need to get lots and lots of patients to be trying experimental drugs, much more than we have enrolled in clinical trials.
If Drs. Henderson and Hooper wanted to make that kind of argument, I’d be all ears. I’d be all ears to any discussion of how they think the drug testing environment would evolve without the FDA as a gatekeeper. Unfortunately, in the pieces I’ve read at least, they never really dive into this topic. It leaves the impression that they think that we can just have a bunch of people try a drug, and it will be obvious which drugs work and which ones don’t.
I get your point about trial homogeneity, I think that’s an issue in earlier trials. I also don’t have the numbers, but my sense is that the vast majority of PH III trials are multicenter, and usually for the big drugs multi-country. Maybe I’m wrong on that though, it’s been a number of years since I had to scour clinical trial results to figure out why what the company was reporting as a positive result, most likely wasn’t.
Which, by the way, is what informs my priors. I came from the tech world into biotech about 15 years ago. In tech, if you could reasonably describe how you were going to build something, chances are it would just work. There might be details that were wrong, or you couldn’t make it into a viable business, or something…but chances were it was technically possible. In biotech, I started meeting with company executives that had all this data about why their drug would work. Theoretical data about how binding to this target would control downstream regulation, animal models showing that it worked, observational data that when a Stage IV pancreatic cancer patient was given the drug, their tumors completely disappeared, stem cell treatments that showed incredibly promising results in small trials towards a whole host of diseases. None of this stood up to scrutiny. I wasn’t an analyst long, but of all the companies that I looked closely at, including those that had already had one successful Ph III trial, I think maybe one of them eventually developed a drug that was effective. the rest failed badly. The CEO of one of the stem cell companies ended up taking his “therapy” to an unregulated market, and now sells treatments for tens of thousands of dollars to desperate patients which are about as effective as injecting them with sugar water. Pretty sure he’s not carefully collecting data to see which patients are getting the most benefit.
The background may have made me overly cynical about the industry, but that’s the kind of outcome I’d like to avoid.
Mark S Barbieri
May 6 2020 at 2:56pm
I don’t understand why we can’t have a compromise approach. Let the FDA still require that products be tested for efficacy, but let companies sell products that have not finished testing with a warning label saying “Shown to be safe, but not yet shown to be effective by the FDA”. And let them sell products that have failed their efficacy tests with a different warning “Shown to be safe, but shown to have been effective in fewer than x% of patients tested.” Make the first warning yellow and the second one red. Allow other drugs to have a green label. With that system, the people that only want “safe and effective” drugs can still do that. Everyone else can make their own decisions.
It would be nice to allow companies to skip the whole process with a different warning, but I’m willing to take baby steps. I’d really like it more of our “protections” were voluntary rather than mandatory.
robc
May 6 2020 at 10:01pm
That is basically the Underwriters Lab model. Get rid of the FDA. Any drug can be released but insurance companies wont pay unless it meets a certain standard. Any docs probably wont prescribe. Any pharmacies wont carry.
Dylan
May 7 2020 at 10:11am
My previous replies showed up for a bit this morning, and then disappeared again. I think I’ve covered most of the rest in my replies to Mark and David, but wanted to at least put in the bit on Intermune, since I think what is in the article is accurate, but misleading by leaving out some of the context.
Short version is that the sub-group analysis was not specified before the trial began, and when you go looking for statistically significant results after doing the trial, you’re bound to find something that looks highly significant, even though it is just by chance. Any economist that has spent time looking at econometric models I think will be well accustomed to the practice. The CEO wrote the PR and the headline was “InterMune Announces Phase III Data Demonstrating Survival Benefit of Actimmune in IPF” with the subheading “Reduces Mortality by 70% in Patients with Mild to Moderate Disease.” And no mention in the press release that this was post-hoc analysis, not a secondary endpoint. Plenty of biotechs do almost the same thing after failing a clinical trial without getting into any kind of trouble, I’ve worked with at least one where we advised them on just this approach, the difference is that they clearly state in the press release that this is a post-hoc subgroup analysis. Also, would have been helpful to mention that Intermune did conduct a second Phase III trial in mild to moderate patients, and found no benefit (trend was actually to negative benefit, as more patients died with the treatment than without). I’d like to know if you think the company would have run the second trial, if the drug was already on the market and they could have just promoted it off-label using the sub-group analysis?
Mark Z
May 7 2020 at 3:08pm
I think there’s definitely a ‘garden of the forking paths’ problem here, and that’s why the process should go like: step 1) give the drug to lots of people, hmm, it only seems to help vegetarians; step 2) try just giving the drug to vegetarians, step 3) step 2 determines whether we keep giving the drug to vegetarians. If a few months down the road even vegetarians aren’t responding, it can be discontinued for them as well. One should of course specify the number of subgroup hypotheses one is testing ahead of time, though that likely means, at clinical trial sample sizes, only testing for affects of sex and maybe a few common biomarkers. An average sized trial will often preclude the possibility of statistically meaningful subgroup analysis for most variables.
Comments are closed.