Episode 47: The Criteria


Can we ever know what causes a chronic disease? In this episode, I’m joined again by Dr. Shoshana Herzig to finish a three-part miniseries on Bradford Hill and Doll’s attempts to prove that smoking caused lung cancer. We’ll talk about the first prospective cohort trial in history, 1960s “Fake News” from tobacco companies, public spats with the most famous statistician of the 20th century, and the development of the Bradford Hill Criteria, a guideline, however imperfect, that gives doctors a blueprint to finally figure out what causes diseases.


  • Crofton J, The MRC randomized trial of streptomycin and its legacy: a view from the clinical front line. J R Soc Med. 2006 Oct; 99(10): 531–534.
  • Daniels M and Bradford Hill A, Chemotherapy of Pulmonary Tuberculosis in Young Adults, Br Med J. 1952 May 31; 1(4769): 1162–1168.
  • Dangers of Cigarette-smoking. Brit Med J 1, 1518 (1957).
  • Doll, R. & Hill, B. A. Lung Cancer and Other Causes of Death in Relation to Smoking. Brit Med J 2, 1071 (1956).
  • Doll, R. & Hill, B. A. Smoking and Carcinoma of the Lung. Brit Med J 2, 739 (1950).
  • Hill, A. The Environment and Disease: Association or Causation? J Roy Soc Med 58, 295–300 (1965).
  • HOFFMAN, F. L. CANCER AND SMOKING HABITS. Ann Surg 93, 50–67 (1931).
  • Keating C, Smoking Kills: The Revolutionary Life of Richard Doll. 2009.
  • Morabia, A. Quality, originality, and significance of the 1939 “Tobacco consumption and lung carcinoma” article by Mueller, including translation of a section of the paper. Prev Med 55, 171–177 (2012).
  • Ochsner, A. & bakey. Primary pulmonary malignancy: treatment by total pneumonectomy; analysis of 79 collected cases and presentation of 7 personal cases. Ochsner J 1, 109–25 (1999).
  • Ochsner, A. My first recognition of the relationship of smoking and lung cancer. Prev Med 2, 611–614 (1973).
  • Parascandola, M. Two approaches to etiology: the debate over smoking and lung cancer in the 1950s. Endeavour 28, 81–86 (2004).
  • Phillips, C. V. & Goodman, K. J. The missed lessons of Sir Austin Bradford Hill. Epidemiologic Perspectives Innovations 1, 1–5 (2004).
  • Proctor, R. Angel H Roffo: the forgotten father of experimental tobacco carcinogenesis. B World Health Organ 84, 494–495 (2006).
  • Wynder, E. RE: “WHEN GENIUS ERRS: R. A. FISHER AND THE LUNG CANCER CONTROVERSY”. Am J Epidemiol 134, 1467–9 (1991).


This is Adam Rodman, and you’re listening to Bedside Rounds, a monthly podcast on the weird, wonderful, and intensely human stories that have shaped modern medicine, brought to you in partnership with the American College of Physicians. This episode is called The Criteria, the final of a three-parter about the linkage between smoking and lung cancer. In this episode, we’re going deep into the dark heart of causality; along the way, we’re going to discuss how Austin Bradford Hill and Richard Doll literally had to invent a new epidemiological method to study diseases — and fight off attacks from the most famous statistician of the 20th century, all while dealing with 1960s “fake news” from tobacco companies. And at the end of this journey, they produced one of the most important medical documents you’ve probably never heard of — the Bradford Hill Criteria, a guideline, however imperfect, to finally allow physicians and scientists to make a determination of whether something truly causes a disease. 


So first, a brief recap. Last time, I was joined by Dr. Shoshana Herzig, physician-epidemiologist extraordinaire, and literally the smartest person I know, to talk about how Richard Doll and Tony Bradford Hill designed a massive case-control study in the 1950s to study whether or not smoking was associated with lung cancer. And what they found, in lockstep with a similar study from the American Cancer Society in the US, was that there was a clear dose-response effect; as people smoked more cigarettes, their risk for cancer steadily increased. But the wider world was not convinced, and Doll and Hill began to conceive of a new type of study — one that would follow a group of patients through time. But how to start? A randomized controlled trial was certainly out — it was impractical (and arguably unethical) to randomize two groups of youths to smoking and not-smoking and follow them over a lifetime. What then?


Fortunately, Doll had an idea. He was a socialist, and shortly after World War II had ended, he had been part of a group of physicians that had lobbied to form the National Health System — the NHS. His job had been outreach, and he had circulated surveys to physicians across the country, and he had gotten timely and accurate returns. Physicians, he figured, were relatively homogenous in Britain in the 50s, were trained in observation and would give accurate information — and had to keep up-to-date registers of their address with the government, which would make follow up easier. Weren’t they the ideal group to study?


Doll, Hill, and the Medical Research Council decided to give it a go. In 1951, a letter and a short questionnaire asking about smoking habits and cancer diagnoses was sent to all 59,600 doctors in Great Britain. It was almost beautiful in how short and elegant the survey was — something modern survey-designers would be wise to emulate. It was only three questions — did you smoke, did you used to smoke but quit, or have you never smoked? For smokers and former smokers it also asked age smoking was started, type of smoking, and total amount smoked. In case anyone was missed, Hill also wrote a letter in the BMJ in November 1951 entitled “Do you smoke?” with a copy of the questionnaire and how to get in contact. I’ve included it in the shownotes, which makes an interesting document to review. The response was overwhelming — 41,024 replies. The post office had to open a sorting office just for the study, and Doll enlisted his seven year old daughter to help open the incredible volume of mail and sort the studies. All of this was done on a budget of 2000 GBP, about 81,000 USD in 2018 currency.


After eliminating low risk groups — men under 35 and women — Doll and Hill had 24,389 doctors to follow up. It turns out that this was a pretty controversial idea, following patients to see if and when they would die, and it was ravaged in the popular and medical press. 


Hill would later recall a cocktail party where another doctor approached him and complained, “You’re the chap who wants us to stop smoking.”


Hill replied, “Not at all. I’m interested if you go on smoking to see how you die. I’m interested if you stop because I want to see how you die. So you choose for yourself, stop or go on. It’s a matter of indifference to me. I shall score up your death anyway, and it will be very useful to me.”


In 1952, Hammond and Horn, epidemiologists with the American Cancer Society, launched a much larger study using Hill’s methodology with 190,000 American men. Remember, the American Cancer Society was skeptical of the link being causative, and thought this novel methodology would disprove once and for all that smoking caused cancer.


After three years, Doll and Hill had enough results to start seeing a trend; they were published in 1954 and showed a statistically significant increase in lung cancer deaths in smokers. But that wasn’t all — they also showed an association with coronary thrombosis, a conclusion that no one had expected. But total numbers were small — only 36 lung cancer deaths out of 789 total deaths, and their conclusions were easy to overlook. But with each passing year of the study, the trickle of data became a deluge, and soon it was impossible to ignore.  First the Americans published their own preliminary results, which similarly showed increases in lung cancer and coronary thrombosis deaths in heavy smokers. And then in 1956, Doll and Hill published their famous follow up study in the BMJ. After four years and five months of follow up, there had now been 1,714 deaths, 88 from lung cancer. The last two years had made the findings even more striking.  They could now calculate the death from lung cancer in non-smokers as 0.07 per 1000 people, compared to 1.66 per 1000 in the heaviest group — a 20-fold increase. Other respiratory diseases, as well as coronary thrombosis, were also significantly higher. And with more data, all-cause mortality was now increasingly higher, though the p-value was only 0.06 — something we’ll return to later.


In this study, you can start to see Doll and Hill anticipate some of the claims of bias that would soon be lobbed at them  A common concern raised was that the doctors of heavy smokers might be more likely to diagnose their dead patients with lung cancer. Therefore, they wrote to every doctor who had signed a death certificate stating lung cancer, and straight up asked them — did smoking history affect your diagnosis? Unsurprisingly, the vast majority said no. Knowing this was in and of itself subject to a considerable amount of bias, Hill analysed the data from patients with histological evidence of lung cancer — presumably less prone to bias, since the cancerous tissue was literally sitting right under a microscope. In this group, the association was even stronger than patients with clinical diagnoses of cancer — if anything the bias went in the other direction! 


The other concern that had been raised by the famous statistician Berkson, especially of the American data, where the patients had been recruited via social networks, was that there might be some underlying factor that would cause differential response rates. Perhaps, for example, sick nonsmokers with cancer were less likely to answer a questionnaire than sick smokers? Doll and Hill responded that the passage of time should make this moot, they reasoned. After all, who could predict a fatal illness five years down the line and therefore be less likely to respond?


I asked Shani why Doll and Hill invented a brand new method of studying medical questions:

? That’s a great question. I am not a hundred percent sure why he did. So I don’t know if it’s that he wanted to. Um, well actually, you know what, I do have one hypothesis. So with diseases with long latency, the argument of reverse causation can be made. So, um, and actually, um, Bradford’s Hill’s mean counterpoint or a which was Fisher, um, makes that argument that it’s possible that smoking doesn’t actually cause lung cancer.

Shani: 00:17:22 Lung Cancer Causes Smoking. And uh, that’s because lung cancer is a disease that potentially has a relatively long latency period and it’s certainly possible that people who have lung cancer don’t necessarily know it yet. And for some reason that little bit of irritation they’re feeling in the lungs then stimulates them to smoke. That’s what Fisher proposed. Um, I think that a cohort study gets at that a little bit better than does a case control study because you are then starting with the exposure and following them forward to see if they develop the disease. Now, it’s certainly still possible that those patients at the outset had cancer and we didn’t know about it, but it seems a little bit less likely the longer that you follow patients up.


I want to ask this question to you again, dear listeners. Imagine you are a doctor in 1956. The preliminary results of both Doll and Hill’s British Doctor Study, as it was now being called, and the American Cancer Society are in, showing increased deaths from lung cancer, other lung diseases, heart attacks, and all-cause death. Is this enough to convince you that smoking causes lung cancer? Would you stop smoking? Shani has stopped almost five years ago with the case-control study was released. So did Richard Doll, but this study made him bribe his wife to quit smoking.


These results were also enough for the Medical Research Council. In 1957, they published a position paper summarizing basically everything I’ve talked about in this episode — the retrospective case-control studies across the world, Hill and Doll and Wynder and Graham’s large retrospective studies, and the American and British prospective studies. Accompanying it was an editorial from the BMJ. I’ll just quote their conclusion; “Last year 18,000 died of cancer of the lung. The disease is on the increase. The evidence against cigarette-smoking is now so overwhelming that, in the absence of contrary valid evidence or of some means of excluding the carcinogenic factor from what used to be called “the yellow peril,” it is incumbent on doctors to do all they can to dissuade the young from acquiring a habit so deleterious to health.” 


This is strong language, even moreso that it dates to 60 years ago, with huge implications. It was not new that physicians had been strongly encouraged to dissuade certain types of behavior — half a century before, doctors and public health reformers have essentially eliminated the habit of public spitting to reduce TB transmission rates, and the prohibition movement had seen doctors encourage their patients to become teetotalers. But tobacco companies were wealthy, had concentrated powerful, and more importantly were operating in a much more sophisticated advertising environment. And they rather disagreed with Hill and Doll’s conclusions, at least publically. 


Big tobacco fired their first salvo shortly after the preliminary study was published, in 1954, with the infamous “A Frank Statement to Cigarette Smokers” in the US, which stated: 


  1. That medical research of recent years indicates many possible causes of lung cancer.
    2. That there is no agreement among the authorities regarding what the cause is.
    3. That there is no proof that cigarette smoking is one of the causes.
    4. That statistics purporting to link cigarette smoking with the disease could apply with equal force to any one of many other aspects of modern life. Indeed, the validity of the statistics themselves is questioned by numerous scientists


Of course, their internal data didn’t match up with this bullish public face. Famously, the Imperial Tobacco Company’s statistician Geoffery Todd, after years of arguing against retrospective studies showing a link between cancer and smoking, finally became convinced with the British Doctor Study and the American Cancer Society study and told his superiors he would quit unless they accepted the conclusions. He didn’t get the chance to, though, since he was summarily fired (though later reinstated). 


But I’m not going to talk about the great lengths the tobacco companies went to to discredit scientists and misinform the public, nor the obscene amount of human suffering they’ve trafficked in the name of profit, and continue to do so largely in the developing world. That would take another episode altogether, and likely make me way too angry to feign any sort of objectivity. I want to talk about debates about causality. I want to talk about “The Frank Statement” claim that there was no agreement between authorities, that the statistics were in doubt, and that anything could be linked to cause lung cancer, because if posterity has shown us anything it’s that Doll and Hill we’re right. But it just so happened that their greatest foe was one of the most influential statisticians of the 20th century — RA Fisher.


So let’s talk about Fisher, who came up in the previous episode. As a young man, he worked at the Rothamstead Experimental Farm, where the government was running experiments applying different types of fertilizers to increase crop yields. Fisher realized the fundamental problem with this approach — there might be factors in different fields that would had different yields no matter what, which he called “confounders,” a word that comes up all the time over the past several episodes. He realized that instead of coating entire fields with different fertilizers, he could use a random number table and randomize different fertilizers to many small plots of land, effectively neutralizing confounders. To analyze these differences, he developed a statistical method called  “analysis of variance” to determine whether the differences were real. You’ve probably heard the method called by its shorthand — ANOVA. 


This statistical method changed biological science, and Fisher literally wrote the textbook on the method. When Bradford Hill had proposed the double-blind randomized controlled trial on streptomycin in tuberculosis I talked about in the previous episode,  he was essentially using the method Fisher had pioneered. Every RCT that has come since owes a debt to RA Fisher.


So when the recently retired Fisher decided to take on the emerging consensus that cigarette smoking caused lung cancer, people paid attention. He was, after all, one of the most important minds of the first half of the 20th century, and acerbic and prickly at that. I think there are two ways to view his disagreement, one of which is considerably more charitable than the other. 


So let’s talk about his intellectual arguments first. At a fundamental level, Fisher disagreed with the method that had been used to determine causation — a prospective cohort trial. He felt that randomization of some sort would be necessary to rule out confounders. He famously wrote, “It is not the fault of Hill or Doll or Hammond that they cannot produce evidence in which a thousand children of teen age have been laid under a band that they shall never smoke, and a thousand more chosen at random from the same age group have been under compulsion to smoke at least thirty cigarettes a day. If that type of experiment could be done, there would be no difficulty.”


I asked Shani what she thought of this argument.


Shani: 00:24:02 Not everything can be studied in a randomized fashion. It’s just simply not possible. I mean, great example of that is the joke article in Bmj that talks about parachutes. You know, whether or not they’re actually helpful in terms of preventing death after jumping out of a plane. You’re just not going to find people who are going to be willing to be randomized in that respect. And you know, you could potentially find people who are willing to be, at least back then, you probably could have found people who would have been willing to be randomized to cigarette smoking or not. Nowadays you probably couldn’t. People feel very strongly one way or another as to whether or not they’re going to smoke. So I dunno, I think that one of the major roles for observational data is the state where you really, either where there’s not enough equipoise to randomize patients or were, you just aren’t going to be able to do that pragmatically. 


And because this was impossible, Fisher felt that it was on the impetus of researchers to eliminate every possible other cause first, and in the meantime avoid making any strong recommendations about smoking. In particular, he advanced two theories — that rather than being the CAUSE of cancer, nebulous precancerous “inflammation” caused discomfort and made people more likely to smoke to relieve the discomfort. Smoking, then, was the RESULT of cancer rather than the cause. The second was that smoking and lung cancer both had a common cause — some genetic disposition. He even did some small twin studies in an attempt to show there was a genetic disposition. He confidently wrote, “For it will be as clear in retrospect, as it is now in logic, that the data so far do not warrant the conclusions based upon them.”


Shani: 00:20:58 Um, so he, he basically had four arguments against the existing literature and why he did not think that smoking causes lung cancer. One is he makes the argument that we just spoke about, which is reverse causation. You know, just because a and B are correlated does not mean that a causes. B, it’s also possible that B causes a, now this obviously to us seems crazy nowadays, but certainly before the association was repeatedly demonstrated, it was something worth considering. Um, his second argument is that there may be confounding going on and the main confounder that he proposes is actually genetics. So what if there’s a gene that makes people more likely to smoke as well as more likely to get lung cancer? So he’s just harnessing confounding there. Um, the third is that he argues that the trends, um, in terms of the increase in smoking and the increase in lung cancer aren’t really, don’t really go hand in hand and arm parallel.

Shani: 00:21:57 That was actually an incorrect assertion on his part and I think is reflective of his role as a statistician and not really an epidemiologist. He seemed to misconstrue some of the statistics. Um, and then his fourth argument against why this is not a causal association is actually based on something that Bradford Hill and Doll showed, which is that there was a lack of an association between people who inhaled tobacco smoke versus people who didn’t and the likelihood of developing lung cancer. And the argument is that if smoking really causes lung cancer, then shouldn’t the people who are inhaling have a higher incidence of lung cancer and then the people who aren’t and the failure to find that is one of the things Fisher really harped on. Um, and in reality, I think that is one of the things that was the weakest in Bradford Hill and dolls paper.

Shani: 00:22:52 Um, but you know, there are reasons potentially to explain that finding a way and subsequently in future studies there was a relationship found between inhalation and incidence of lung cancer, but um, so taken together he’s challenging this relationship on four different fronts, only really probably one of which has any sort of validity at the time. And globally speaking, he consistently failed to incorporate the full body of knowledge. He would focus on these little tiny inconsistencies which taken in and of themselves you can make a compelling argument for. But when you look at kind of the full scope of the evidence, it would be crazy to conclude what he concluded.


And now the uncharitable — and I personally think more convincing– reasons. It was a more innocent time, and the phrase “conflict of interest” didn’t exist. But Fisher had them in spades. #1 — he gleefully took money from the tobacco companies while he traveled around the world promoting his book “The Cancer Controversy.” He made no attempt to keep this hidden — he felt that he was too smart to be swayed by their money. And he has considerable ideological conflicts of interests. He was an elitist, a eugenicist, and a libertarian, and had recommended that the government pay upper class families to procreate.To that end, he and his wife had eight children. That the government would take an active stand in preventing citizens from doing something enjoyable in the name of health especially incensed him. 


Of course, despite Fisher’s confidence that everything would be clear in retrospect, he was on the wrong side of history. Evidence continued to pour in — by the early 60s, both the American and British studies, which would continue through the lifetime of their participants — were showing statistically significant mortality differences; smokers died at almost twice the rate of nonsmokers. This led the Royal College of Physicians in 1962, and famously the US Surgeon General in 1964, to declare the smoking was a cause of lung cancer and of chronic bronchitis, which led to the modern tobacco-control efforts which have decreased smoking rates in the US from 45% of adults in the 50s to 16% in 2016, according to Gallup. Holford and colleagues estimated in 2014 that these efforts had saved 8 million lives in the United States, with an average of 20 extra years of live for each of these individuals, and had increased life expectancy 2.3 years for men, and 1.6 years for women. Almost a third of the increase in life expectancy in the US in the second half of the 20th century, they found, was due to smoking cessation programs.


But the challenge that Fisher had lobbed to Doll and Hill probably changed medical science for the better, since it caused Bradford Hill to advance an alternative theory of causality, which he laid out in a 1964 speech. This is the now revered Bradford Hill Criteria. It’s a remarkable document, and I encourage anyone who’s interested in any extra reading to read the original, which is in the shownotes. Viewed in the context of the tobacco debate, it has far more nuance than how it is usually represented in epidemiology textbooks. 


Hill argues for a fundamentally pragmatic approach to causality; there is not some magical philosophical definition of a cause;  what he cares about is that a factor can identified, and that this factor can then be modified and decrease incidence of the disease. The exact biochemical etiology can be sussed out later from scientific research. To do this, he lays out nine “aspects” as he calls them, that would later be rebranded as criteria: the strength of an association, the consistency, the specificity, the temporal sequence of event, a dose response effect, experimental evidence, biological plausibility, coherence, and analogy. 


I asked Shani her thoughts about the Bradford Hill Criteria.

 I’m in love with the Bradford Hill criteria. I teach about it all the time. Why are you in love with it? I think he

Shani: 00:30:02 offers such a nice systematic approach to assessing the likelihood of causality in any association that’s observed. And I think that actually, um, the conflation of association and causation actually happens really frequently. So as a journal editor, I review papers all the time and it is amazing how commonly people will use causal language in an observational study. And it’s one of my pet peeves as an editor, I’m constantly writing back to authors saying you need to remove all instances of causal language in this observational study that can only demonstrate association. Yeah.

Adam: 00:30:38 Can you give me an example of causal language that you dislike?

Shani: 00:30:42 Um, so a lead to an increase in be like you can’t say that you can say that a was associated with an increase in B. You can’t say it led to an increase in B. Um, or showing that are seeing something like our data indicate that it’s really just our data suggest that. So it’s, it’s being a little less certain in the way that you described something.

Adam: 00:31:05 How do you feel about the phrase correlate? Is it correlation is not causation. Correlation does not imply causation. Yeah. How do you feel it will say that phrase again? Cause I missed you there and tell me how you think about it, how you feel about it. Correlation does not imply causation. I wish everyone would say that 10 times together when they wake up in the morning. Like the number of s of observations I see people people make in everyday life that are really just a function of correlation and not causation, but that they interpret as causal is just mind blowing and really frustrating from the standpoint of an epidemiologist or a statistician. So you walk through life repeating that as your mantra and just getting increasingly frustrated. Yeah, a little bit, I guess. You know, what, how would be my mantra if I had a mantra that would be it


I think Bradford Hill would agree with Shani’s chosen mantra, but he is also making a rather subtle point and important point: there is no magic criteria that proves causation. Reality is messy and complicated.


Shani: 00:34:44 In fact, I would say that probably 90 out of a hundred observational studies that you see in journals don’t even meet half of these. So No. And it’s just simply not possible to meet some of them within the context of one single study. Uh, usually. Um, yeah,


He saved some of his most poignant moments for the end of his paper, where he cautions against an over-reliance on tests of significance for causality, writing, “such tests can, and should remind us of the play of chance can create, and they will instruct us in the likely magnitude of those effects. Beyond that they contribute nothing to the ‘proof’ of our hypothesis.”


So after he lays out each of these criteria and numbered one through eight, he then goes on in a few paragraphs to talk about the pitfalls of the p value. And actually he, um, he has some really nice quotes that I, um, I hope you don’t mind. Yeah, I was going to say, I hope you don’t mind cause you you’re not gonna be able to stop me from reading it basically. So he, he first starts by saying, fortunately we haven’t yet gone so far as our friends in the USA where I am told some editors of journals will return an article because tests of significance have not been applied. So he thinks it’s, this kind of dates him a little bit. He thinks that’s crazy that we would require test of significance in every article.

Shani: 00:37:27 But he goes on to really highlight some important limitations of the p value. So he says, yet there are numerable situations in which they are totally unnecessary because the difference is grotesquely obvious because it has negligible or because whether it be formerly significant or not, it is too small to be of any practical importance. He’s essentially getting at the difference between clinical and statistical significance. And then he says what is worse? The glitter of the tea table diverts attention from the inadequacies of the fair. And what he means by that is that this focus on just p values as being the final arbiter of whether or not we should believe something distracts people and takes their focus away from other things that they should really be focusing on, which is how the study was designed. What are the biases and potential confounders inherent in the way someone does a, designed a study. And so he hits on those things really, really nicely, um, and goes on to say like fire the Chi Square test is an excellent servant and a bad master. And basically he means that, you know, we use it for our purposes, but we should not let it dictate our final conclusions. There’s more to it than just a p value or just a Chi square value.


I suspect that was probably a response to Fisher on particular.


I think a lot of physicians actually find the Bradford Hill Criteria to be discomforting. We want reality to be like, well, Koch’s postulates. There was a law that was to be followed, like Newton’s Laws of Motion. You could go down a series of checkboxes, and a tick in each of the four, and BAM, causality. But causality was now messy, and likely always contingent on any number of factors. 


For one thing, the Criteria are an impossible standard — they can likely never be met by a single study. And then we had to consider — was a single study ever enough to prove causality? How about two? Three? Can we ever truly prove causality?

I don’t think anyone has the answer to that to be honest. Um, you mentioned that the fragility index is, um, relevant for randomized controlled trials. I think within the context of observational studies, there are some things that we can do as well. And in fact, um, Sir Bradford Hill, uh, it sounds like he did some of these things to kind of counteract some of fishers, um, rebuttals. And basically what I’m referring to is you can do sensitivity analyses wherein you evaluate the strength that an unmeasured confounder would have had to have had with both your exposure and your outcome of interest, as well as the prevalence of that confounder in order to invalidate your conclusions. And basically the stronger and affect you observe, the harder it’s going to be to explain that result from confounding alone. And so that’s kind of an analogous, I almost, you could almost call it an analogous fragility test applied to the observational study setting, um, relative to the fragility index applied within the context of an RCT. Both are getting at how fragile is this result, how much stock should we be putting in this result? I don’t care that it’s significant. The question is, do we believe, how strongly do we believe this or how strongly do we believe that this is causal and not due to something else, namely bias, confounding, or chance?


Hill realized that his method is messy, and doesn’t necessarily offer easy answers. But he felt it was the most intellectually honest approach to the morass of medical and population data that constitutes epidemiology. Despite the frustration of having his work ignored, and then attacked — and in 1964, while the tide was turning, the attack was still very much under way, he ends his famous paper on an optimistic note about the nature of science, and the tricky work of determining causality. “All scientific work is incomplete — whether it be observational or experimental. All scientific work is liable to be upset or modified by advancing knowledge. That does not confer upon us a freedom to ignore the knowledge we already have, or to postpone the action that it appears to demand at a given time. Who knows but the world may end tonight? True, but on available evidence most of us make ready to commute on.”


That’s it for the show! Make sure to listen after the episode is over, when we talk about communicating this uncertainty into our conversations with patients, and the implications for clinical practice. But first — it’s time for #AdamAnswers! 


#AdamAnswers is the segment on the show where I answer any questions that you have about medicine. But for this episode, I actually asked on Twitter for questions for Shani for a segment I’m calling #AskShani. I’ll even add different theme music!

So this is the Hashtag ask Shani from our colleague Rahul Ganatra e says, I’d love to hear lessons learned from modern examples of being fooled by observational data. I was shocked when a resident was unfamiliar with the Women’s health initiative slash h r t story. And conversely, when observational data are better than randomized controlled trials, practical limitations aside. And then he says thanks cause he’s a very polite man.

Shani: 00:57:07 Well thank you, Rahul for the excellent question. Um, I think that the issue that you bring up around hormone replacement therapy is actually one of the things I teach on most often as a great example of not only when and how observational data get things wrong, but also it’s a great example of one of the criteria that we’ve been talking about today with respect to Bradford Hill, which is this criteria of specificity of the association. So let me tell you what I mean by that. So, um, basically, um, the story begins with one of the most famous studies of the last several decades, which is the nurses’ health study. It was an observational study, um, and when it was first published, uh, in 1985, it examine the association between hormone replacement therapy and heart disease, and showed that women who are taking estrogen had about a third of the risk of having a heart attack compared to women who had never taken it.

Shani: 00:58:00 Um, and this seems compelling enough that thousands of women had their physicians start prescribing them hormone replacement therapy. And then in 1998 so a little over 10 years later, the hers study comes along, which now is in randomized controlled trial, but it’s for secondary prevention in coronary heart disease. So using hormone replacement therapy in patients who have already had a cardiac event, does it prevent another one and they actually found no benefit. So this started to raise eyebrows but people still want it to believe and they said, well maybe it doesn’t help with secondary benefit. What about primary prevention in the context of an RCT? And so enter the women’s health initiative in 2003 and here they were studying primary prevention. So women who didn’t already have coronary heart disease and they randomized them to hormone replacement therapy or not. And in this case they again did not find any cardiac protection from a hormone replacement therapy.

Shani: 00:58:59 And in fact, they found that it may actually slightly increase the risk of coronary heart disease with a hazard ratio of 1.24 and confidence limits from 1.0 to 1.54. And so the question then became, became why did the observational study get it wrong? So now we have RCT data saying not only does this not help, but it may hurt what went wrong. And, um, there were some interesting clues about what had gone wrong that actually came out just a few years after the initial observational study. And what that data was that I’m to, which I’m referring is the Walnut Creek contraceptive drug study. And this was another large observational study, just like the nurses’ health study, where they again showed a benefit of hormone replacement therapy on these various outcomes related to coronary heart disease. However, they also showed a benefit of hormone replacement therapy on death from homicide, suicide, and motor vehicle accidents.

Shani: 00:59:59 And there is no possible way that hormone replacement therapy somehow keeps you from having motor vehicle accidents or committing suicide. And so, um, they basically said because there is no plausible association between Hormone Replacement Therapy and these other outcomes, it seems that something else is explaining the difference in the observed outcomes other than hormone replacement therapy. And we now know that’s something different to be something called a healthy user bias, which is that women who tended to be healthier and more health conscious took hormone replacement therapy and other women who didn’t, didn’t have that, and they ended up having differences in outcomes just by virtue of this healthy user bias. And so I think it’s a really nice illustration of when the observational study got it wrong and they’re randomized controlled trial. Got It. Right. And it served to elucidate a type of bias that actually plays a lot of observational studies.


Wait — there’s one more thing! I’m finally getting my game together and launching a Bedside Rounds twitter account! So let me introduce you to Brendan Daly, the new Bedside Rounds social media editor:




You can follow the account @BedsideRounds to see Brendan’s threads, and more great content from Bedside Rounds. 


Now that’s REALLY it for the show.  If you are a member of the American College of Physicians, you can get CME or MOC credit just for listening to the podcast; go to www.acponline.org/BedsideRounds and take a brief quiz.


You can find all the episodes on the website at www.bedside-rounds.org, or on Apple Podcasts, Spotify, or wherever you get your podcasts. The facebook page in /BedsideRounds. I’m personally on Twitter @AdamRodmanMD, where I not only Tweet about internal medicine and medical history, but make Tweetorials. Send me a line!


All of the sources are on the website.


And finally, while I am actually a doctor and I don’t just play one on the internet, this podcast is intended to be purely for entertainment and informational purposes, and should not be construed as medical advice. If you have any medical concerns, please see your primary care provider. And now, the remainder of my conversation with Shani:


Adam: 00:47:28 So one reason that I love talking to your shiny is that not only are you a brilliant researcher, you’re also a brilliant physician who takes care of the care of patients. And this is this. No, seriously, this is what’s so I mean this is part of the point that RA Fisher didn’t really understand the clinical context that things were happening. And as a practicing physician you are very well aware. How do we as doctors and how do we communicate this to our patients? Like how fragile some of the knowledge that we’re basing our decisions on, how do we communicate that

Shani: 00:47:57 it’s so hard. And I think you hit the nail on the head of the thing that often causes tension between physicians at between patients and the healthcare system is the realization on the part of the patient that we don’t always know what we’re doing. And this comes up all the time. I mean we, we make our best guess. And patients think, well how did you, how did you not know that this wouldn’t work? Or how did you know, you know, how did you not know that this was gonna happen? And they start realizing that so much of this is a gray zone. And that’s very understandably so, very disconcerting for patients. And so I don’t know necessarily what the answer is other than being up front with patients at the outset about what we do know and what we don’t know. And just helping them to understand that it takes a lot of testing and scientific data to even get to the point of finding an association, let alone a causal one. But how you, what words you use to describe that to a patient is really up to you. It’s hard.

Adam: 00:49:00 Yeah. And I think it’s challenging for physicians to know it’s, you have so much training and I you are capable of understanding all these things month many of these studies pass rate over my head. Like how are individual physicians supposed to navigate this.

Shani: 00:49:14 Yeah. So actually one of the things that um, my colleagues and I talk about or debate it a little bit about is whether or not individual physicians should actually even be attempting to interpret, um, studies in the medical literature. And so the counter argument is that they should not be doing that. They should just be using sources that actually have already gotten experts to appraise the data and make recommendations based on that data. And those experts should be people who have expertise in interpreting the results of studies. Um, so the average physician doesn’t have that expertise and the conclusions that they may draw, you know, they’re the ones who are actually at risk for over interpreting a p value at the exclusion of all the other information that can be gained from a study. And so I am actually supportive of that viewpoint that um, for the average physician you should be going to sites and platforms that aggregate that information that experts have culled through and formulated recommendations on as opposed to trying to make interpretations from individual studies on their own in the absence of background training in that, because it is a unique skillset.

Adam: 00:50:26 Do you, do you believe then in practice changing studies, is there such a thing as a practice changing study?

Shani: 00:50:32 Um, it’s funny that you ask that. As you know, I do a presentation every year called an update in hospital medicine where we’re expected to select the most potentially practice changing articles. And I’m always a little bit torn because honestly it probably shouldn’t be practice change. Most things probably shouldn’t be practice changing until multiple studies have demonstrated something and until it’s been worked into guidelines, because up until that point you’re just kind of going out on a limb and making an interpretation of a single study that you may or may not actually be understanding. And so I would always suggest waiting until experts in a field have agreed that this should be practice changing. Now the really practice changing studies do get incorporated into guidelines very quickly. So after the DAWN trial that I mentioned before came out almost immediately, there were new guidelines published that recommended going ahead with um, Athera, you know, interventional thrombectomy in patients with stroke up to 24 hours after the onset of their symptoms. And so there are practice changing studies I guess when I’m advocating for is not for individual physicians to make that decision on their own, wait until the field collectively has decided that this is a practice changing study to go ahead and change their practice.

Adam: 00:51:47 That’s interesting. I mean, part of the challenges, what about when a very high quality study comes into something where there has been very poor. So an example, say the, you know, the oviva trial in the New England Journal of Medicine, that’s the prosthetic joint infection, oral antibiotics and prosthetic joint infections. Um, there’s no good data on that. And then all of a sudden there was a randomized controlled trial, hey, this is not going to be, and I’m just curious how you take what you would take from that.

Shani: 00:52:10 Yeah, it’s a really good question. Um, there certainly are areas that are where it’s gray enough to know what the right thing to do is where if you did the thing that that study showed you couldn’t be accused of not, um, adhering to standard of care because there is no good standard of care. So there are some studies where it’s just the area is so great that no matter what you do, you’re going to be fine and you should just pick what you think is better and just do it. Um, then there are studies like the one you just mentioned or the other study looking at, um, oral antibiotics in the setting of Endocarditis, poet, poet. Yeah. And what to what the individual physician should do in those circumstances isn’t a hundred percent clear. But I would say that if you have a patient where you can’t for, for various reasons, they’re not going to go out on IV antibiotics, then you have now some substantiation for at least giving them oral antibiotics is going to be better than not treating them at all. So I think there are circumstances and contexts in which that you should incorporate results like that from individual studies. So I guess I’ve come full circle and I take back what I said as only you are capable of doing at him. Rodman is getting me to, to change my mind on things.

Adam: 00:53:30 Well, I, I think this is going to be a very unsurprising conclusion that I’ve drawn from all of this, but there’s not a right answer here, right? It’s, it’s murky modern medicine. We like to pretend that what we do as a clear cut

Shani: 00:53:42 science, do you know what it is Adam here, here’s what, I’m not going to take back what I said before. I’m going to modify it and modulate it, which is that because science is not precise. I think we need to recognize the imprecision in the science and what that means is sometimes doing something because you don’t really know what’s right and so you can do whatever you want and that circumstance, the converse of that is don’t tout w one study as the be all and end all of something. So I think that’s what I was kind of getting at with one study shouldn’t really be practice changing is like it’s, it’s a limited slice of evidence and it’s, most things are still pretty gray. And so making sure that you have kind of thought about it big picture wise and tea and put whatever study it is into the context of what has already been known is I think important. But in reality, so many of these things are gray that you’re probably justified and you know, doing whatever you think is, um, of the most benefit to the patient sitting in front of you at any given time.

Adam: 00:54:46 Being a doctor is hard, Huh?

Shani: 00:54:48 Oh my God, the nuances are never ending.