Does smoking cause lung cancer? How could you ever know? The second in a three-part series on causality, I’m joined by Dr. Shoshana Herzig to discuss how Austin Bradford Hill and Richard Doll set out to try and answer this question — and along the way revolutionized the way we think about what causes disease. Along the way, we’ll talk about the first double-blinded randomized controlled trial, the long shadow of infectious disease and TB in particular, and why epidemiology done well is beautiful. Plus, a brand new #AdamAnswers about chest compressions!
Please support the show by taking the listener demographic survey .
- Bost TC. Cardiac arrest during anaesthesia and surgical operations. Am J Surg 1952;83: 135-4
- Council, T. Tobacco Smoking and Lung Cancer. Brit Med J 1, 1523 (1957).
- Crofton J, The MRC randomized trial of streptomycin and its legacy: a view from the clinical front line. J R Soc Med. 2006 Oct; 99(10): 531–534.
- Daniels M and Bradford Hill A, Chemotherapy of Pulmonary Tuberculosis in Young Adults, Br Med J. 1952 May 31; 1(4769): 1162–1168.
- Dangers of Cigarette-smoking. Brit Med J 1, 1518 (1957).
- Doll, R. & Hill, B. A. Lung Cancer and Other Causes of Death in Relation to Smoking. Brit Med J 2, 1071 (1956).
- Doll, R. & Hill, B. A. Smoking and Carcinoma of the Lung. Brit Med J 2, 739 (1950).
- Hill, A. The Environment and Disease: Association or Causation? J Roy Soc Med 58, 295–300 (1965).
- HOFFMAN, F. L. CANCER AND SMOKING HABITS. Ann Surg 93, 50–67 (1931).
- Hurt R, Modern cardiopulmonary resuscitation—not so new after all. J R Soc Med. 2005 Jul; 98(7): 327–331.
- Keating C, Smoking Kills: The Revolutionary Life of Richard Doll. 2009.
- Keith A, Three Hunterian Lectures ON THE MECHANISM UNDERLYING THE VARIOUS METHODS OF ARTIFICIAL RESPIRATION PRACTISED SINCE THE FOUNDATION OF THE ROYAL HUMANE SOCIETY IN 1774. (1909). The Lancet, 173(4464), 825–828.
- Kouwenhoven WB et al, Closed-chest cardiac massage, JAMA, JAMA. 1960;173(10):1064-1067.
- Morabia, A. Quality, originality, and significance of the 1939 “Tobacco consumption and lung carcinoma” article by Mueller, including translation of a section of the paper. Prev Med 55, 171–177 (2012).
- Ochsner, A. & bakey. Primary pulmonary malignancy: treatment by total pneumonectomy; analysis of 79 collected cases and presentation of 7 personal cases. Ochsner J 1, 109–25 (1999).
- Ochsner, A. My first recognition of the relationship of smoking and lung cancer. Prev Med 2, 611–614 (1973).
- Parascandola, M. Two approaches to etiology: the debate over smoking and lung cancer in the 1950s. Endeavour 28, 81–86 (2004).
- Phillips, C. V. & Goodman, K. J. The missed lessons of Sir Austin Bradford Hill. Epidemiologic Perspectives Innovations 1, 1–5 (2004).
- Proctor, R. Angel H Roffo: the forgotten father of experimental tobacco carcinogenesis. B World Health Organ 84, 494–495 (2006).
- Wynder, E. RE: “WHEN GENIUS ERRS: R. A. FISHER AND THE LUNG CANCER CONTROVERSY”. Am J Epidemiol 134, 1467–9 (1991).
This is Adam Rodman, and you’re listening to Bedside Rounds, a monthly podcast on the weird, wonderful, and intensely human stories that have shaped modern medicine, brought to you in partnership with the American College of Physicians. This episode is called Cause and Effect; it’s the second in a three part series about the linkage between cigarette smoking and lung cancer. We’re going to be introduced to the protagonists of our story — Austin “Tony” Bradford Hill, and Richard Doll, and odd-couple economist-doctor pair whose quest to find the cause of the mysterious lung cancer increase throughout the world would change the way we understand causality — and disease — for all time.
To tell this story, I’m going to be joined by my friend, and one of the most brilliant people that I know, Dr. Shani Herzig. First, some introductions:
Shani: 00:00:27 I am Shani Herzig. I am a practicing hospitalist as well as the director of hospital medicine research at Beth Israel deaconess medical center and an assistant professor of medicine at Harvard medical
Adam: 00:00:36 school. A nd you have been on the show before. What type of research do you do?
Shani: 00:00:41 I do mostly pharmacoepidemiology, but really anything that involves hospital based outcomes is kind of right up my alley.
Adam: 00:00:49 And what type of studies are you doing? I look at
Shani: 00:00:52 mostly medication exposures in large groups of patients to try to tease out the risk to benefit ratio. And then I try to use that information to inform physician practice either through clinical dishes. This is in clinical decision support or through educational interventions and initiatives.
Before jumping into the story head first, let’s do a quick recap of episode 44. As the 20th century dawned, it became increasingly clear that doctors were facing a dramatic increase of lung cancer. Several small case-control studies in Germany and the United States had suggested a linkage to cigarette smoking, and experiments in Argentina had shown that tars from tobacco smoke could cause skin cancer in rabbits. But the generally accepted explanation for the increase was the dramatic increase in environmental pollution, especially air pollution from road tarring and industrialization.
The years after WW2 saw the rich countries of the world — which were now being called the “first world” — turn their attention to fighting chronic diseases, as it became increasingly clear that the war against epidemic infectious disease had largely been won. This period saw the formations of national health systems — especially the NHS in the UK — and shifting of funding priorities to discover how to treat — and prevent — chronic diseases. This sounds passe to us now, but it was intellectually a huge shift. Ever since Koch had discovered the anthrax bacillus and popularized germ theory, the mission of medicine had been identifying infectious causes of diseases. After all, infectious diseases have traditionally been the biggest cause of death, and still is today in low-income countries(https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death), though ischemic heart disease and stroke are the two biggest killers in every other income group. While the first decades of the 20th century hadn’t seen many effective therapies — salvarsan chemotherapy and infecting patients with malaria were literally the standard of care of patients with syphilis, and I dedicated an entire episode to bacteriophage therapy, the use of bacteria-killing viruses to fight infection — knowledge of the causative factors of many infectious diseases had allowed doctors and public health officials to make major inroads against disease, to the point that cholera, dysentery, malaria, and yellow fever largely disappeared from rich countries, and tuberculosis rates plummeted. And then during WW2, penicillin came into regular use, the first antibiotic, and the prognosis for many previously fatal diseases changed immediately overnight.
The best example in this changing war was tuberculosis, which I covered extensively in episode 39. I want to briefly talk about the fight against tuberculosis again, because its effects — and its treatment, is intimately connected with the lives and fates of the two odd-couple researchers who would make the tobacco-lung-cancer link and change the nature of medical epistemology forever — Austin “Tony” Bradford Hill, and Richard Doll.
Austin Bradford Hill was born in 1897, and ever since he had been a boy he had set his sights on studying medicine. This was a time period in which medical students would grimly joke that two or three of their colleagues in the room would be dead in a few years from tuberculosis. The “prosector’s wart” — cutaneous TB from performing autopsies — was considered just another occupational risk, let alone the risk of catching the disease from one’s patients. Everyone knew the story of the revered Rene Laennec and his friend and colleague Bayle, who had done so much to accurately describe tuberculosis at the beginning of the 19th century — only for them both to die tragically early of the disease.(https://jcp.bmj.com/content/56/4/254). Bradford Hill himself would become intimately familiar with tuberculosis. He was a pilot in WW1 in the Dardenelles, where he developed pulmonary tuberculosis. This ironically likely saved his life, since the mortality rate of pilots was almost 86%. But he still almost died — he required an artificial pneumothorax and two years to recover in the Maudsley Sanatorium. Even after discharge he was incredibly ill — the Air Force expected him to die shortly so awarded him a pension. This likely changed the fate of medicine; because he was too sickly to study medicine, with its requisite autopsies and time on the wards, he went into economics instead. and became obsessed with using statistics to not only describe disease, but prevent it. He was inspired by many of the epidemiologists I talked about in Episode 42, including Nightingale, Galton, and Chadwick.
In the 1940s, this led him to use these new statistical methods in medical trials. Now, medical trials have arguably existed for hundreds of years, and it could be argued that James Lind devised the first controlled trial, assigning groups of sailors with scurvy to get either cider, vitriol, vinegar, seawater, citrus, or a spicy paste. And by the early 20th century, medical trials had become commonplace for new treatments — by modern parlance these would be prospective cohort trials. A group was treated and then followed for a period of time, with some sort of outcome measured. For example, in many of the Salvarsan studies in the teens, the percentage of primary syphilis lesions that resolved would be counted, or the Wassermann’s tests that converted to negative. And when penicillin was developed, it was originally tested in the same sort of fashion — prospective cohort trials, with no thoughts to a comparison group.
The discovery of penicillin had spurred large-scale research into other fungal products to serve as antibiotics. Unfortunately, while penicillin killed many bacteria, it did not affect the tuberculous bacillus. In the US, a soil biology lab discovered two fungi that made streptomycin, which killed TB in vivo, and also in guinea pigs. There was a lot of excitement — but the drug was expensive, and the impoverished Great Britain was only willing to purchase a limited amount of streptomycin to study. So the government turned to the Medical Research Council to decide how to best use this limited resource. There was nothing terribly unusual about this situation — the same thing had happened with the strictly rationed penicillin. And the initial recommendations were rather unsurprising — the MRC suggested streptomycin be used on the most severe forms of tuberculosis — TB meningitis, and miliary TB, and its effectiveness be judged this way. They advocated not giving the medication to patients with pulmonary tuberculosis, since the standard of care for TB — rest in a sanitorium, or an artificial pneumothorax, carried a far better prognosis.
Enter Bradford Hill, who was now working on the MRC. Undoubtedly his own experience with pulmonary TB, artificial pneumothorax, and his two years of recovery in a sanitorium had a big influence on him. He argued that the MRC should also study streptomycin in patients with pulmonary tuberculosis, given the uncertain course of the disease even with the best care. And to assess its effectiveness, it should be compared with bedrest in a sanitorium. But how best to do this? By the middle of the 20th century, the limitations of prospective cohort trials had become increasingly clear. While comparison groups dated back to Lind, the fundamental problem was, how could you make sure the patients in each group were as similar as possible? Hill here was influenced by the famous statistician RA Fisher, who had recently randomized plots of land to study fertilizer effect on crop yields. The idea was that randomization would allow for natural variance in soil quality to be “canceled out”. Hill advocated doing the same thing in patients — randomizing patients to either the standard of care — bed rest in a sanitorium — or the standard of care plus streptomycin. After a patient with pulmonary tuberculosis was admitted and met the trial’s admission criteria, that is, “acute progressive bilateral pulmonary tuberculosis of presumably recent origin, bacteriologically proved and unsuitable for collapse therapy,” an envelope containing a card labeled either S for streptomycin or C for control would be mailed to the center; the order of these envelopes was determined on random number tables. The patients did not know that they were in a medical trial — though the doctors treating them did.
This brought on Bradford Hill’s second innovation — how to determine success of the therapy? The outcome measures were monthly chest x-rays, sputum counts of bacteria and culture, fever, weight, and inflammatory markers of the blood (the ESR). Some of these were objective but others were open to human interpretation, especially x-rays. Bradford Hill realized that radiologists might be more likely to underread the x-ray of a patient who was known to be on streptomycin. Therefore, the radiologists who read the films, and the bacteriologists who examined the sputum were both blinded to which treatment group the patients were in as well. This was, therefore, the first double-blind, randomized-controlled trial. Neither the patients knew which group they were in, nor did the doctors evaluating them. For all that you have heard about the trial, the results were promising, but not ground-breaking. In the first six months of the streptomycin group, four of 55 patients died, compared to 15 of 52 in the control group. But the two groups were far more similar in the second six months. The reason became abundantly clear soon — tuberculosis rapidly developed resistance to streptomycin. But soon after a new anti-tuberculosis drug was discovered — para-aminosalycylic acid, or PAS for short, and further randomized controlled trials showed that combination therapy was far more effective, with little resistance. Our anti-tubercular therapy still rests on the fundamental premise of combination therapy, though we generally start with four drugs now, and streptomycin has been been mostly abandoned. But Bradford Hill and the MRC had ushered in a new age of medical therapeutics.
Next, Richard Doll. Doll was a generation younger than Bradford Hill. He too caught tuberculosis during the second world war. He required a nephrectomy — literal removal of his kidney, and completed medical school to become a physician. A committed socialist, after the war he helped lobby for formation of the National Health Service. But he found himself bored with the realities of clinical practice. He wanted to make a big impact, and like Hill, he was attracted to the population-level promises of epidemiology. He also had a favorite epidemiologist — but where Hill had emulated Farr and Chadwick, Doll idolized their opponent — John Snow, who had not only made the first connection between cholera and its mode of transmission but had gone out of his way to have the Broad Street pump handle removed, thus ending the outbreak. He soon found himself working at the Medical Research Council, and when the Ministry of Health asked the MRC to try and find what was behind the increase in lung cancer illness, Hill and Doll found themselves working together.
I think it’s important to note that prior to investigating lung cancer, both Doll and Hill were agnostic about the cause. Hill had an “open mind,” and Doll felt that it was likely environmental pollution, road tarring in particular. And they were both smokers, like much of the male population of the UK at that point. Hill smoked a pipe, and Doll smoked unfiltered cigarettes.
The first thing they did was go over the previous data on smoking and lung cancer, which I reviewed in the episode 44, and they were unconvinced that there might be a link. Was this reasonable? I asked Shani.
So one of the very first pieces was a case series by Alton Ochsner that went over just eight patients who presented with Adenocarcinoma of the lung and had been soldiers in world war one and were heavy smokers. Would that have been enough to convince you? No. Yeah, me neither.
Shani: 00:06:13 It’s, it’s just um, you know, some of, some of, so first of all, being an epidemiologist, I really do believe that there needs to be rigorous study and evaluation of any drug disease or exposure disease association. A single case series is never going to be enough to sway me of something. Um, it certainly will be provocative and hypothesis generating. And, and you know, certainly when there’s a huge effect size, you need less evidence to convince people. And so if every single person exposed to something has a certain outcome and no one who’s not exposed to that have that outcome, that’s really, really compelling. And it would probably get me to maybe stop doing whatever it was until additional data came out. But it proud at some point I’d probably go back to doing whatever it was until I had the really rigorous data depending on how much I liked doing whatever that thing was.
Adam: 00:07:06 So then the next step, we go to the 1930s and you have Angel Roffo, he’s an Argentinian physician and did lots of experiments. He would get tar extracted from tobacco smoke and apply them to the backs of rabbits, which he shaved and they would develop cancer. Would that convince you that smoking causes lung cancer? Okay.
Shani: 00:07:24 Uh, no, it wouldn’t. Again, it would be hypothesis generating and really provocative. But you know, I’m sure we can all think of a lot of reasons why read the backs of rats are different from human lungs and you know, to jump from one to the next is not necessarily a direct line.
Adam: 00:07:40 So now we get a little more interesting. So we get into some interesting case control studies from the 1930s late 1930 has done in Germany. And I don’t know if you that one of them. Do you know, do you know Alfred Morabia? Do you know who he is? Oh, you’ve spoken about him on your other podcasts. I have such a crush on him. So He’s a Danish. I’m going to edit that out in case he ever hears this. He’s a Danish epidemiologist and he likes to reanalyze old data sets of classic epidemiology. So he actually translated the German studies. But that was Pierre Louis’ work, correct? Yes, yes. I’m sorry. Let’s pause so we can honor how much we love him. Yeah. What a cool job. Someone who’s built an academic career like reanalyzing old studies. Yeah. Cool. So the, the German studies were pretty small. I believe it had 50 patients who were hospitalized with lung cancer and then it matched them with I think 38 or 40 healthy controls. But no evidence of how these controls were selected. They may have just been people on the street and it found that there was a significantly higher, they did use p values cause they did I believe that like a Chi Square test and found that there was a significantly higher rate of heavy smoking. Would that have been enough to convince you? No,
Shani: 00:08:56 I don’t think it would. And it’s for a reason that Sir Bradford Hill actually touches on even with respect to his case control study, which is the, there are many opportunities for bias in case control studies. Um, a lot of which stem from potentially lack of comparability between cases and controls and the degree to which you are certain that your controls and your cases are similar enough that you can compare the two. Um, another problem with case controls that he touches on, um, relates to recall bias. So anytime you’re asking someone who has a disease to recall prior exposures, they’re probably going to work a little harder to remember every exposure they had than someone who didn’t have the disease. And then finally there can be ascertainment bias. Um, if the people who are asking the questions know that, who the cases are and who the controls are, it can kind of, you can see how that could potentially bias the way that they ask the questions or how extensively they strive to obtain all the information possible from one versus the other.
Shani: 00:09:54 So, so no, a very small case control is going to, I’m not convinced me quite the same way as a large case control or a case control where they really rigorously made sure that their cases and controls were comparable or finally a cohort study, which is really going to do the best job of convincing me
Doll and Hill were on the same page as Shani — they realized the previous studies had many weaknesses– notably, that they were very small, were prone to recall and ascertainment bias, and didn’t have carefully chosen controls. The answer, they decided, was to do another case-control study, but to do it as thoroughly as was possible. They wanted to test two null hypotheses — that either smoking or environmental pollution was the cause of the increase of lung cancer. But they didn’t want recall or ascertainment bias to affect patients; Hill therefore developed a “fearsome” questionnaire of over 50 questions that included a variety of otherwise banal subjects — where they lived, their occupations, how much fried and fatty food they ate, the type of stove they had, whether they lived near a gas works, and of course smoking.
They recruited 20 London area hospitals; every time a patient was admitted with lung, stomach, colon, or rectal cancer Doll and Hill would be alerted, and a social worker would go to interview the patient. But for every cancer patient they interviewed, they would identify a second patient in the same hospital, the same sex, and within a 5-year age range, and interview them as well. In the end, 2475 patients — cases and controls together — were interviewed. This was almost an order of magnitude larger than previous studies — and all the data was analyzed by hand.
And what they found was surprising for how definitive the answer was. Like other studies, they found as association between lung cancer and smoking — and did not find an association between smoking and other cancers, or smoking and non-cancer respiratory disease like pneumonia. But what really surprised the was the existence of a dose response effect. Their detailed questionnaire had allowed Doll and Hill to quantify the amount of smoking, measured in lifetime consumption of cigarettes; the highest group was over half a million, giving you an idea of how much people smoked during this period. With each gradation up the smoking ladder, the amount of lung cancer increased.
I asked Shani what she thought about the their conclusions:
That is just I think beautiful. So he, um, first of all he talks about the consistency of the association. Um, so dis no matter how he categorizes smoking, no matter what approach he takes to look at the effect of smoking on lung cancer, the effect is always the same. He always finds an association between smoking a quantity and incidents of lung cancer. He also, um, repeatedly demonstrates a biological gradient. So the more someone smokes, the higher the incidence of lung cancer. Um, he uses a, something that I’m going to call a test of specificity, but something that, um, others, including Babu, Jenna, have termed a prespecified falsification end point. And basically all it is is saying, does your exposure cause your disease of interest and not cause other diseases that are not hypothesized to have an association with that exposure. Um, now if you find an association with other diseases, it doesn’t necessarily invalidate the exposure, disease a relationship of interest, but it certainly bolsters your case if you don’t observe the same relationship with other, uh, diseases.
Shani: 00:12:03 Um, he also goes to great lengths or are, um, I should say they, cause this is, this is both doll and hill, uh, both go to great lengths to assure that their results are not confounded by age and also not confounded by place of residence, which I think is really important. Um, and then they, they, uh, talk about the fact that their results are significant even in the face of almost certain, um, misclassification and result in bias to the nol. And so basically, because they’re certainly not able to perfectly ascertain who, who is a current smoker, who’s not, or how long they have smoked. And I’m kind of their degree of exposure. There’s certain to be misclassification there that’s going to ultimately lead to a bias towards the Nala. In other words, a failure to find a difference where it may actually exist. The fact that they found not just a difference, but a striking difference even in the face of this bias to the no is a pretty convincing thing as well. And then finally, just the strength and magnitude of the effect. It’s just, it’s huge. I mean he talks about that in a, in a smokers who smoke a lot, it’s like a 10 to 20 time increase in risk. And so he, all of those things make this really, really compelling work.
They also found one piece of “anomalous” data — one of their questions had been whether or not cigarette smoke was inhaled; they hypothesized that there should be a correlation between inhalation and lung cancer. But they failed to find any — the relationship was the same regardless of whether the smoker had inhaled.
Though they admit they have no sense of the exact carcinogen, and even allow that perhaps it is a pesticide in the tobacco itself, they write, “It must be concluded that there is a real association between carcinoma of the lung and smoking.”
The trial was published in 1950. Doll and Hill had sat on their data for almost a year and were actually beat to the punch by Wynder and Graham of the American Cancer Society in the United States, who independently had conducted a similar study in America of 684 patients which similarly showed high rates of bronchogenic carcinoma in heavy smokers versus controls, with the same dose-response relationship.
So here’s the question I have for you — it’s 1950. Two large case-control retrospective cohort studies have been published showing a strong association between smoking and lung cancer. Richard Doll was clearly convinced of the link — in 1949 he stopped smoking cold turkey. But Bradford Hill continued to smoke, and in the United States the American Cancer Society felt that there must be a confounder. What would you have done? Would this have been enough to convince you of a link between cancer and smoking? Would you have stopped like Doll? Obviously, I had to ask Shani, the most logical person I know, this question:
Adam: 00:13:23 Would you have stopped smoking? Were you a smoker at this time?
Shani: 00:13:26 Yes, I would absolutely have stopped smoking.
Adam: 00:13:29 So, uh, Richard Dall stopped smoking. But you know, Bradford Hill did not stop smoking at this point. Wow.
Shani: 00:13:36 So interesting. So did anyone ask him why he hadn’t stopped smoking? Cause I mean, maybe some of this ties back to what I was saying before, which is that if you really like smoking, you’re gonna wait for certainty before you stop. Or maybe you might continue smoking even when there is certainty because you’ve decided that you’d rather take the risk of lung cancer then live without cigarettes. And that’s totally understandable. And we see that all the time.
Adam: 00:13:59 So you know, you and Bradford Hill, you guys think very similarly because that was it. He enjoyed smoking. He was a pipe smoker, he enjoyed smoking so much that he wanted a stronger causal relationship. And after the British doctors’ study, after the second data set, he actually stopped smoking.
Shani: 00:14:14 Well, you also just raised another important point, which is that in his work he often focused on contracting pipe smoking to tobacco smoking. And so I think his results happen to actually reassure him. And in that respect, in that pipe smoking may be slightly less carcinogenic than tobacco smoking.
However, the rest of the world did not react like Shani. In fact, it barely reacted at all. The answer for most of the public and the scientific community was basically a shrug — from doctors, from policy makers, from the public — and surprisingly, from the tobacco companies themselves. In order to understand why, we need to talk about causality in medicine.
I want you to ask yourself — what CAUSES a disease? And how would you know whether or not this was the case? This question has vexed physicians for millennia. The ancients felt that imbalances in body humors caused disease, a belief that lasted well into the modern era. By the late 18th and early 19th centuries, it was clear this was not the case — but a convincing alternative had not yet been found. This was a huge debate in the early 19th century, between those who thought it really didn’t matter — just classify and treat diseases — and the etiologists. The etiologists, of course, largely won this debate. The 19th century was obsessed with the general idea of “irritation” — of the nerves for Cullen, of the gastrointestinal tract for Broussais, of the vasculature for Rush. “Miasma” — the ancient idea that noxious odors cause epidemic disease — was repurposed as a universal etiology; perhaps smells caused all disease? In some quarters vitalism had come roaring back. Perhaps disease was caused by blockages of a theoretical vital energy source. And in the extreme minority were the views of Henle, who theorized that microorganisms might be the cause of many types of disease.
All of this changed, of course, in 1876, when Robert Koch isolated the anthrax bacillus, and with it advanced a potent new way to determine whether or not an infectious microorganism caused a disease. They are today called “Koch’s postulates,” and it’s essentially the method that Koch personally used in his studies.
So, number one — an organism must be isolated in every case of a disease. Number two, that organism must be grown in pure culture. Number three, when inoculated back into laboratory animals, it must again cause the disease. And number four, when the organism is again isolated from those diseased animals, it must be the same as what was found in postulate number one.
Using these postulates, scientists had rapidly isolated the cause of many previously mysterious diseases — and with this power, had developed powers to fight them. Koch’s postulates were clearly not perfect; even in Koch’s time their limitations were clear, as some infectious agents cannot easily be grown in culture — like syphilis, for example — and it became clear that there are some patients who are completely asymptomatic but still infectious. Typhoid Mary is the classic example here. And by 1950, the limits of germ theory had already been demonstrated; I went over this extensively in Episode 36, Filth Parties, about Joseph Goldberger’s attempts to demonstrate that pellagra was not an infectious disease.
But when Hill, Doll, Wynder, and Graham’s studies were published in 1950, Koch’s postulates were still firmly ingrained as the primary methodology to show causation. It’s not that anyone thought lung cancer was infectious, but rather that there should be a relatively clear exposure for causality. A big problem for many physicians was that many patients had smoked heavily and had not developed lung cancer. Didn’t this point to another cause, or a genetic predisposition? The idea that chronic diseases might have a variety of causes was still relatively novel.
I asked Shani why Koch’s postulates have had such a hold on physicians.
Shani: 00:25:23 Well, the one that I bring up often is that when you give something and it causes what you think it causes, an important thing is you, when you take it away, that thing, you know, potentially goes away. But with re-exposure you should see it again. And that’s often touted as one of the most compelling things. I had this come up with a patient I was caring for recently where we were concerned that a drug might be causing an adverse reaction. And so the patient stopped the drug and that thing went away. And I said, what will really cement this is if we restarted the drug as almost a test of our theory and see if this comes back. Now you obviously don’t want to do that when that thing is something really bad, but uh, but it’s a compelling way to actually help show a causal relationship.
Adam: 00:26:07 Yeah, that’s good. That’s the too long. Didn’t read of Koch’s postulates. So what is it about infectious disease in particular that made causation so easy to understand for people when something like cigarette smoking and lung cancer is so difficult for people to accept?
Shani: 00:26:25 I mean, I can think of a couple at least a c ouple of things. One is that there is much less of a latency with most infectious diseases between inoculation and the time of experience of symptoms. Um, with smoking, obviously it takes years and years and years. So that’s one thing. The second thing is that it is easier to look under a microscope, I think and see the presence of the bacteria among patients who are having, um, the illness and not isolated among patients who aren’t. Whereas with lung cancer requires invasive means. And back then it was hard to even localize where a cancer was until at the time of autopsy. So those are two potential reasons I’m sure you can think of even more.
So in 1951, Hill and Doll went back to the drawing board. Population medicine up until this point had been purely retrospective — looking back in time. Clearly what was needed was some sort of experiment. Bradford Hill had been instrumental in the development the first randomized controlled trial in medicine. But the scale was vastly different. The MRC streptomycin trial involved 107 patients followed up other 15 months. There were kept observed in a sanitarium the entire time. To perform a randomized controlled trial on smoking would require recruiting thousands of children to either start smoking, or not, and then follow them up over decades. Clearly a brand new type of study would need to be developed. Bradford Hill started to think about a new type of “forward-looking” study — looking at frequencies of disease in the same population over time, what we now called a prospective cohort trial. Essentially the idea would be to look at a homogenous group of smokers and nonsmokers over time, and to see if one group developed lung cancer at higher rates. But there was literally no precedent for this. How to even start? How many people were necessary? How would you follow them all? A daunting question for sure — that’s it for now. We will hit the rest of that story next time!
I hope you enjoyed this episode — but first, it’s time for a #AdamAnswers!
#AdamAnswers is the segment on the show where I answer whatever questions you have about medicine. And for this one, we have one from Dr. Green, who I believe is now tied for the record with Dr. Serota at three questions submitted.
Well, your instincts are very good Dr. Green! CPR would not be invented for another decade and a half — that scene is an anachronism. But the reason why is actually pretty interesting, as well as the reason they didn’t show the actual 1946 standard of care for cardiac arrest.
So the concept of external chest compressions is quite old — but it’s not QUITE what we think of as CPR today. The first unequivocally documented chest compressions enter the medical debate around the beginning of the 18th century, when it starts to become increasingly clear that “death” is more of a nebulous concept than originally thought. It’s a very exciting time with implications for medical practice today — a few years ago, I did an episode called “Buried Alive” about this panic. The major focus of chest compressions was drowning victims in particular. So-called preservation societies cropped up in major cities throughout England and the US — Boston had one several on the Charles for example — with emergency kits designed to revive drowning victims. The mainstay of treatment was ventilation — both what we would today call “rescue breathing,” or mouth-to-mouth resuscitation, and by use of a bellows to literally blow air in the lungs. And, of course, tobacco-smoke enemas, which were popular as well. But chest compressions still played a major part in resuscitating drowning victims. I’ll post some pictures on Twitter from a Goulstonian lecture in 1909 going over these methods — but these chest compressions were designed to relieve the LUNGS and not the heart, and looked nothing like modern CPR (in fact, one literally had the doctor running a patient over a barrel).
Modern chest compressions were invented by Friedrich Mass in 1892 — his paper has been translated into English, which you can read in the shownotes. A 9 year-old boy was getting chloroform to correct a cleft palate. Unfortunately, he awoke screaming and received a second dose. Then he stopped breathing, became cyanotic, and lost his pulse. Maass pushed on the child’s xiphoid process for four minutes, and noted that he began to breathe again. He continued to push for 40 minutes, and the boy made a full recovery. He published this case, and another on so-called “chloroform syncope” in French and German — but besides some basic trials on animals, his method never caught on and was abandoned for almost 70 years.
The reason why is that during this period, internal cardiac massage was developed. In 1901, Dr. Igelsrud was performing a lapatoromy on a patient, who had a cardiac arrest. He emergently resected the fourth and fifth ribs, opened the pericardium, and massaged “between the thumb on one side and the fore and middle fingers on the other” for about a minute. The patient made a full recovery. This technique caught on like wildfire — internal cardiac massage would either be performed through an abdominal incision, like Dr. Igelsrud, or by making a midline incision. A review article in 1952 suggested that recovery was obtained in almost one third of cases.
Modern chest compressions were re-discovered almost by accident by Dr. Kouwenhoven while researching defibrillation. Defibrillation paddles were placed on the chest of a dog who had an arterial line in to monitor blood pressure. When they arrested the poor doggo, he realized that just the weight of the paddles could increase the dog’s blood pressure. He began to experiment on dogs and realized that the heart could be effectively massaged simply by pushing on the chest. A small trial was done on patients in a new CCU, and the results were published in 1960, showing a survival rate of 70% — far higher than internal cardiac massage. And more importantly, this was a method that could be started outside of the hospital — no need to directly cut into someone’s thorax. Modern CPR was born.
So here’s the problem with your TV show Dr. Green — yes, technically external chest compressions had been invented, but it seems exceedingly unlikely that the protagonist had scoured German medical journals from the 1890s. The standard of care would have been to literally open her thorax and massage the heart. Understandably, this would be rather distressing to show on television.
So thank you very much for your question! And dear listeners, if you have any questions that you want answered, please tweet them to me @AdamRodmanMD, or even better, record and send them to me!
That’s really it for the show! Next month Shani and I will finish this story by talking about the first prospective epidemiological study in history, the creation and controversy surrounding the Bradford Hill criteria, and really interrogating how we can ultimately ever know the cause of anything in medicine. If you’re a member of the American College of Physicians, you can get CME or MOC credit just by listening to this episode. Go to www.acponline.org/BedsideRounds and take a brief quiz.
And a brief personal update — the last couple months have been some of the most exciting, but also exhausting, of my life. The most proximate cause, of course, is my son, Sam, who is wonderful, but is also undergoing a four month sleep regression at the moment. But I’m also traveling around the country to lecture about not only medical history, but also using podcasting and Twitter to teach to new generations of medical learners. I couldn’t imagine this happening even just a year ago — and it’s largely due to all of you, the crazy people who want to hear me go over centuries-old medical literature each month. So thank you guys. I obviously intend to keep making Bedside Rounds; it’s been an amazing adventure and has made me a much better doctor. Which is why I want to ask again for you to take a listener demographic survey at https://survey.libsyn.com/bedsiderounds.
You can find all the episodes on the website at www.bedside-rounds.org, or on Apple Podcasts, Spotify, or wherever you get your podcasts. The facebook page in /BedsideRounds. I’m on Twitter @AdamRodmanMD, where I not only Tweet about internal medicine and medical history, but also craft historical Tweetorials. I recently did one going over the literature on human body temperature, and how most humans probably “run low” and a fever is lower than you think. So come by and say hi!
All of the sources are on the website.
And finally, while I am actually a doctor and I don’t just play one on the internet, this podcast is intended to be purely for entertainment and informational purposes, and should not be construed as medical advice. If you have any medical concerns, please see your primary care provider.
Bost TC. Cardiac arrest during anaesthesia and surgical operations. Am J Surg 1952;83: 135-4
Fortunately, Doll had an idea. As part of his work to develop the NHS, he had circulated surveys to physicians, and had received excellent and timely data. Physicians, he figured, were relatively homogenous in Britain in the 50s, were trained in observation and would give accurate information — and had to keep up-to-date registers of their address with the government, which would make follow up easier. Weren’t they the ideal group to study?
In 1951, a letter and a short questionnaire asking about smoking habits and cancer diagnoses was sent to all 59,600 doctors in Great Britain. It was three questions — did you smoke, did you used to smoke but quit, or have you never smoked? For smokers and former smokers it also asked age smoking was started, type of smoking, and total amount smoked. In case anyone was missed, Hill also wrote a letter in the BMJ in November 1951 entitled “Do you smoke?” with a copy of the questionnaire and how to get in contact. I’ve included it in the shownotes. The response was overwhelming — 41,024 replies. The post office had to open a sorting office just for the study, and Doll enlisted his seven year old daughter to help open the incredible volume of mail and sort the studies. All of this was done on a budget of 2000 GBP, about 81,000 USD in 2018 currency.
After eliminating low risk groups — men under 35 and women — Doll and Hill had 24,389 doctors to follow up. This idea — following patients to see if and when they would die — was ravaged in the press. Hill would later recall a cocktail party where another doctor approached him and complained, “You’re the chap who wants us to stop smoking.”
Hill replied, “Not at all. I’m interested if you go on smoking to see how you die. I’m interested if you stop because I want to see how you die. So you choose for yourself, stop or go on. It’s a matter of indifference to me. I shall score up your death anyway, and it will be very useful to me.”
In 1952, Hammond and Horn, epidemiologists with the American Cancer Society, launched a much larger study using Hill’s methodology with 190,000 American men. Remember, the American Cancer Society was skeptical of the link being causative, and thought this methodology would disprove once and for all that smoking caused cancer. By 1954 Hill and Doll’s preliminary results were published and already they suggested a statistically significant increase in lung cancer deaths in smokers. And in fact, they also showed an association with coronary thrombosis and heavy smoking — a conclusion that no one had expected. But total numbers were small — only 36 lung cancer deaths out of 789 total. Skepticism continued to reign. But quickly the data trickling in was impossible to ignore. Hammond and Horn published their own preliminary results, which also showed increases in lung cancer deaths in heavy smokers — but increases in coronary thrombosis deaths as well. And then in 1956, Doll and Hill published their famous follow up study in the BMJ. After four years and five months of follow up, there had now been 1,714 deaths, 88 from lung cancer. The findings were even more striking than two years before. With more data, they calculated the death from lung cancer in non-smokers as 0.07 per 1000 people, compared to 1.66 per 1000 in the heaviest group — a 20-fold increase. Other respiratory diseases, as well as coronary thrombosis, were also significantly higher. And with more data, all-cause mortality was now increasingly higher, though the p-value was only 0.06 — something we’ll return to later.
Doll and Hill also went out of there way to address complaints of bias. A common concern raised was that the doctors of heavy smokers might be more likely to diagnose their dead patients with lung cancer. Therefore, they wrote to every doctor who had signed a death certificate stating lung cancer, and straight up asked them — did smoking history affect your diagnosis? Unsurprisingly, the vast majority said no. Knowing this was in and of itself subject to a considerable amount of bias, Hill analysed the data from patients with histological evidence of lung cancer — presumably less prone to bias. In this group, the association was even stronger than patients with clinical diagnoses of cancer — if anything the bias went in the other direction! The other concern that had been raised by the famous statistician Berkson, especially of the American data, where the patients had been recruited via social networks, was that there might be some underlying factor that would cause differential response rates. Perhaps, for example, sick nonsmokers with cancer were less likely to answer a questionnaire than sick smokers? Doll and Hill responded that the passage of time should make this moot, they reasoned. After all, who could predict a fatal illness five years down the line and therefore be less likely to respond?
I asked Shani why Doll and Hill invented a brand new method of studying medical questions:
? That’s a great question. I am not a hundred percent sure why he did. So I don’t know if it’s that he wanted to. Um, well actually, you know what, I do have one hypothesis. So with diseases with long latency, the argument of reverse causation can be made. So, um, and actually, um, Bradford’s Hill’s mean counterpoint or a which was Fisher, um, makes that argument that it’s possible that smoking doesn’t actually cause lung cancer.
Shani: 00:17:22 Lung Cancer Causes Smoking. And uh, that’s because lung cancer is a disease that potentially has a relatively long latency period and it’s certainly possible that people who have lung cancer don’t necessarily know it yet. And for some reason that little bit of irritation they’re feeling in the lungs then stimulates them to smoke. That’s what Fisher proposed. Um, I think that a cohort study gets at that a little bit better than does a case control study because you are then starting with the exposure and following them forward to see if they develop the disease. Now, it’s certainly still possible that those patients at the outset had cancer and we didn’t know about it, but it seems a little bit less likely the longer that you follow patients up. And so in Bradford Hill’s, even in his cohort study, they didn’t really follow people for that long of a period of time.
I want to ask this question to you again, dear listeners. Imagine you are a doctor in 1956. The preliminary results of both Doll and Hill’s British Doctor Study, as it was now being called, and the American Cancer Society are in, showing increased deaths from lung cancer, other lung diseases, heart attacks, and all-cause death. Is this enough to convince you that smoking causes lung cancer? Would you stop smoking? Shani has stopped almost five years ago. By this point Doll, who quit when Shani did, made his wife quit by bribery. And the Medical Research Council certainly felt strongly about it. In 1957, they published a position paper summarizing basically everything I’ve talked about in this episode — the retrospective case-control studies across the world, Hill and Doll and Wynder and Graham’s large retrospective studies, and the American and British prospective studies. Accompanying it was an editorial from the BMJ. I’ll just quote their conclusion; “Last year 18,000 died of cancer of the lung. The disease is on the increase. The evidence against cigarette-smoking is now so overwhelming that, in the absence of contrary valid evidence or of some means of excluding the carcinogenic factor from what used to be called “the yellow peril,” it is incumbent on doctors to do all they can to dissuade the young from acquiring a habit so deleterious to health.”
The cigarette companies, naturally, disagreed, at least publically, starting in 1954 with the infamous “A Frank Statement to Cigarette Smokers” in the US, which stated:
- That medical research of recent years indicates many possible causes of lung cancer.
2. That there is no agreement among the authorities regarding what the cause is.
3. That there is no proof that cigarette smoking is one of the causes.
4. That statistics purporting to link cigarette smoking with the disease could apply with equal force to any one of many other aspects of modern life. Indeed, the validity of the statistics themselves is questioned by numerous scientists
Of course, their internal data didn’t match up with this bullish public face. Famously, the Imperial Tobacco Company’s statistician Geoffery Todd, after years of arguing against retrospective studies showing a link between cancer and smoking, finally became convinced with the British Doctor Study and the American Cancer Society study and told his superiors he would quit unless they accepted the conclusions. He didn’t get the chance to, though, since he was summarily fired (though later reinstated).
But I’m not going to talk about the great lengths the tobacco companies went to to discredit scientists and misinform the public, nor the obscene amount of human suffering they’ve trafficked in the name of profit, and continue to do so largely in the developing world. That would take another episode altogether, and likely make me way too angry to feign any sort of objectivity. I want to talk about debates about causality. I want to talk about “The Frank Statement” claim that there was no agreement between authorities, that the statistics were in doubt, and that anything could be linked to cause lung cancer.. So I want to talk about Doll and Hill’s greatest foe, and one of the most influential statisticians of the 20th century — RA Fisher.
So a brief aside on Fisher. I think it’s fair to say he’s one of the most important statisticians of the 20th century. As a young man, he developed the method of randomization to study fertilizers to small plots of land as a method to remove confounders, and then developed a statistical method called “analysis of variance” to determine whether the differences were real. You’ve probably heard the method called by its shorthand — ANOVA. This statistical method changed biological science; while Hill did his RCT on streptomycin, he was basically copying Fisher’s methodology. Fisher had literally written the textbook on how to do these studies.
So when the recently retired Fisher set the entire idea that cigarette smoke causes lung cancer in his sights — and Doll and Hill in particular — it caused quite a stir. After all, one of the most important minds of the first half of the 20th century — and a very acerbic and prickly one at that — was the major dissenter. I think there are two ways to view his disagreement, one of which is considerably more charitable than the other.
So let’s talk about his intellectual arguments first. At a fundamental level, Fisher disagreed with the method that had been used to determine causation — a prospective cohort trial. He felt that randomization of some sort would be necessary to rule out bias. He famously wrote, “It is not the fault of Hill or Doll or Hammond that they cannot produce evidence in which a thousand children of teen age have been laid under a band that they shall never smoke, and a thousand more chosen at random from the same age group have been under compulsion to smoke at least thirty cigarettes a day. If that type of experiment could be done, there would be no difficulty.”
I asked Shani what she thought of this argument.
Shani: 00:24:02 Not everything can be studied in a randomized fashion. It’s just simply not possible. I mean, great example of that is the joke article in Bmj that talks about parachutes. You know, whether or not they’re actually helpful in terms of preventing death after jumping out of a plane. You’re just not going to find people who are going to be willing to be randomized in that respect. And you know, you could potentially find people who are willing to be, at least back then, you probably could have found people who would have been willing to be randomized to cigarette smoking or not. Nowadays you probably couldn’t. People feel very strongly one way or another as to whether or not they’re going to smoke. So I dunno, I think that one of the major roles for observational data is the state where you really, either where there’s not enough equipoise to randomize patients or were, you just aren’t going to be able to do that pragmatically.
And because this was impossible, Fisher felt that it was on the impetus of researchers to eliminate every possible other cause first, and in the meantime avoid making any strong recommendations about smoking. In particular, he advanced two theories — that rather than being the CAUSE of cancer, nebulous precancerous “inflammation” caused discomfort and made people more likely to smoke to relieve the discomfort. Smoking, then, was the RESULT of cancer rather than the cause. The second was that smoking and lung cancer both had a common cause — some genetic disposition. He even did some small twin studies in an attempt to show there was a genetic disposition. He confidently wrote, “For it will be as clear in retrospect, as it is now in logic, that the data so far do not warrant the conclusions based upon them.”
Shani: 00:20:58 Um, so he, he basically had four arguments against the existing literature and why he did not think that smoking causes lung cancer. One is he makes the argument that we just spoke about, which is reverse causation. You know, just because a and B are correlated does not mean that a causes. B, it’s also possible that B causes a, now this obviously to us seems crazy nowadays, but certainly before the association was repeatedly demonstrated, it was something worth considering. Um, his second argument is that there may be confounding going on and the main confounder that he proposes is actually genetics. So what if there’s a gene that makes people more likely to smoke as well as more likely to get lung cancer? So he’s just harnessing confounding there. Um, the third is that he argues that the trends, um, in terms of the increase in smoking and the increase in lung cancer aren’t really, don’t really go hand in hand and arm parallel.
Shani: 00:21:57 That was actually an incorrect assertion on his part and I think is reflective of his role as a statistician and not really an epidemiologist. He seemed to misconstrue some of the statistics. Um, and then his fourth argument against why this is not a causal association is actually based on something that Bradford Hill and Doll showed, which is that there was a lack of an association between people who inhaled tobacco smoke versus people who didn’t and the likelihood of developing lung cancer. And the argument is that if smoking really causes lung cancer, then shouldn’t the people who are inhaling have a higher incidence of lung cancer and then the people who aren’t and the failure to find that is one of the things Fisher really harped on. Um, and in reality, I think that is one of the things that was the weakest in Bradford Hill and dolls paper.
Shani: 00:22:52 Um, but you know, there are reasons potentially to explain that finding a way and subsequently in future studies there was a relationship found between inhalation and incidence of lung cancer, but um, so taken together he’s challenging this relationship on four different fronts, only really probably one of which has any sort of validity at the time. And globally speaking, he consistently failed to incorporate the full body of knowledge. He would focus on these little tiny inconsistencies which taken in and of themselves you can make a compelling argument for. But when you look at kind of the full scope of the evidence, it would be crazy to conclude what he concluded.
And now the uncharitable — and I personally think more convincing– reasons. Fisher had a number of conflicts of interests. #1 — he gleefully took money from the tobacco companies while he traveled around the world promoting the book that he would write called “The Cancer Controversy.” He loved smoking himself; he was an inveterate pipe smoker. He made no attempt to keep this hidden — he felt that he was too smart to be swayed by their money. And he has considerable ideological conflicts of interests. He was an elitist, a eugenicist, and a libertarian, and had recommended that the government pay upper class families to procreate.To that end, he and his wife had eight children. That the government would take an active stand in preventing citizens from doing something enjoyable in the name of health especially incensed him.
Of course, despite Fisher’s confidence that everything would be clear in retrospect, he was on the wrong side of history. Evidence continued to pour in — by the early 60s, both the American and British studies, which would continue through the lifetime of their participants — were showing statistically significant mortality differences; smokers died at almost twice the rate of nonsmokers. This led the Royal College of Physicians in 1962, and famously the US Surgeon General in 1964, to declare the smoking was a cause of lung cancer and of chronic bronchitis, which led to the modern tobacco-control efforts which have decreased smoking rates in the US from 45% of adults in the 50s, to 16% in 2016, according to Gallup. Holford and colleagues estimated in 2014 that these efforts had saved 8 million lives in the United States, with an average of 20 extra years of live for each of these individuals, and had increased life expectancy 2.3 years for men, and 1.6 years for women. Almost a third of the increase in life expectancy in the US in the second half of the 20th century, they found, was due to smoking cessation programs.
In response to Fisher and this debate, Bradford Hill advanced his own idea of causality, which he laid out in a 1964 speech, and which have come to be known as the Bradford Hill criteria. It’s a remarkable document, and I encourage anyone who’s interested in any extra reading to read the original, which is in the shownotes. Viewed in the context of the tobacco debate, it has far more nuance than how it is usually represented in epidemiology textbooks. Hill argues for a fundamentally pragmatic approach; he does not care about the philosophical definition of a cause; what he cares is that a factor can identified, which can then be modified and decrease incidence of the disease. The exact biochemical etiology can be sussed out later from scientific research. To do this, he lays out nine “aspects” as he calls them, that would later be rebranded as criteria: the strength of an association, the consistency, the specificity, the temporal sequence of event, a dose response effect, experimental evidence, biological plausibility, coherence, and
Shani loves the Bradford Hill Criteria.
I’m in love with the Bradford Hill criteria. I teach about it all the time. Why are you in love with it? I think he
Shani: 00:30:02 offers such a nice systematic approach to assessing the likelihood of causality in any association that’s observed. And I think that actually, um, the conflation of association and causation actually happens really frequently. So as a journal editor, I review papers all the time and it is amazing how commonly people will use causal language in an observational study. And it’s one of my pet peeves as an editor, I’m constantly writing back to authors saying you need to remove all instances of causal language in this observational study that can only demonstrate association. Yeah.
Adam: 00:30:38 Can you give me an example of causal language that you dislike?
Shani: 00:30:42 Um, so a lead to an increase in be like you can’t say that you can say that a was associated with an increase in B. You can’t say it led to an increase in B. Um, or showing that are seeing something like our data indicate that it’s really just our data suggest that. So it’s, it’s being a little less certain in the way that you described something.
Adam: 00:31:05 How do you feel about the phrase correlate? Is it correlation is not causation. Correlation does not imply causation. Yeah. How do you feel it will say that phrase again? Cause I missed you there and tell me how you think about it, how you feel about it. Correlation does not imply causation. I wish everyone would say that 10 times together when they wake up in the morning. Like the number of s of observations I see people people make in everyday life that are really just a function of correlation and not causation, but that they interpret as causal is just mind blowing and really frustrating from the standpoint of an epidemiologist or a statistician. So you walk through life repeating that as your mantra and just getting increasingly frustrated. Yeah, a little bit, I guess. You know, what, how would be my mantra if I had a mantra that would be it
His major point, though, is that none of these alone predict causation; it takes a combination of often messy data.
Shani: 00:34:44 In fact, I would say that probably 90 out of a hundred observational studies that you see in journals don’t even meet half of these. So No. And it’s just simply not possible to meet some of them within the context of one single study. Uh, usually. Um, yeah,
He saved some of his most poignant moments for the end of his paper, where he cautions against an over-reliance on tests of significance for causality, writing, “such tests can, and should remind us of the play of chance can create, and they will instruct us in the likely magnitude of those effects. Beyond that they contribute nothing to the ‘proof’ of our hypothesis.”
So after he lays out each of these criteria and numbered one through eight, he then goes on in a few paragraphs to talk about the pitfalls of the p value. And actually he, um, he has some really nice quotes that I, um, I hope you don’t mind. Yeah, I was going to say, I hope you don’t mind cause you you’re not gonna be able to stop me from reading it basically. So he, he first starts by saying, fortunately we haven’t yet gone so far as our friends in the USA where I am told some editors of journals will return an article because tests of significance have not been applied. So he thinks it’s, this kind of dates him a little bit. He thinks that’s crazy that we would require test of significance in every article.
Shani: 00:37:27 But he goes on to really highlight some important limitations of the p value. So he says, yet there are numerable situations in which they are totally unnecessary because the difference is grotesquely obvious because it has negligible or because whether it be formerly significant or not, it is too small to be of any practical importance. He’s essentially getting at the difference between clinical and statistical significance. And then he says what is worse? The glitter of the tea table diverts attention from the inadequacies of the fair. And what he means by that is that this focus on just p values as being the final arbiter of whether or not we should believe something distracts people and takes their focus away from other things that they should really be focusing on, which is how the study was designed. What are the biases and potential confounders inherent in the way someone does a, designed a study. And so he hits on those things really, really nicely, um, and goes on to say like fire the Chi Square test is an excellent servant and a bad master. And basically he means that, you know, we use it for our purposes, but we should not let it dictate our final conclusions. There’s more to it than just a p value or just a Chi square value.
I think that was probably a response to Fisher: “Too often I suspect we waste a deal of time, we grasp the shadow and lose the substance, we awaken our capacity to interpret data and to take reasonable decisions no matter the value of p. And far too often we deduce “no difference” from “no significant difference.” Like fire, the Chi squared test is an excellent servant and a bad master.”
When you look at it from this angle, the Bradford Hill Critera are rather discomforting. Koch’s postulates had given a law that was to be followed. It’s like Newton’s Laws of Motion, if you think about it — four postulates have to be ticked off, and if you can meet them all — BAM, causality. But causality was now messy, and likely always contingent on any number of factors. For one thing, the Criteria are ticked off — they can likely never be met by a single study. And was a single study even enough? Can we ever truly prove causality?
I don’t think anyone has the answer to that to be honest. Um, you mentioned that the fragility index is, um, relevant for randomized controlled trials. I think within the context of observational studies, there are some things that we can do as well. And in fact, um, Sir Bradford Hill, uh, it sounds like he did some of these things to kind of counteract some of fishers, um, rebuttals. And basically what I’m referring to is you can do sensitivity analyses wherein you evaluate the strength that an unmeasured confounder would have had to have had with both your exposure and your outcome of interest, as well as the prevalence of that confounder in order to invalidate your conclusions. And basically the stronger and affect you observe, the harder it’s going to be to explain that result from confounding alone. And so that’s kind of an analogous, I almost, you could almost call it an analogous fragility test applied to the observational study setting, um, relative to the fragility index applied within the context of an RCT. Both are getting at how fragile is this result, how much stock should we be putting in this result? I don’t care that it’s significant. The question is, do we believe, how strongly do we believe this or how strongly do we believe that this is causal and not due to something else, namely bias, confounding, or chance?
Hill realized that his method is messy, and doesn’t necessarily offer easy answers. But he felt it was the most intellectually honest approach to the morass of medical and population data that constitutes epidemiology. Despite the frustration of having his work ignored, and then attacked — and in 1964, while the tide was turning, the attack was still very much under way, he ends his famous paper on an optimistic note about the nature of science, and the tricky work of determining causality. “All scientific work is incomplete — whether it be observational or experimental. All scientific work is liable to be upset or modified by advancing knowledge. That does not confer upon us a freedom to ignore the knowledge we already have, or to postpone the action that it appears to demand at a given time. Who knows but the world may end tonight? True, but on available evidence most of us make ready to commute on.”
That’s it for the show! Make sure to listen after the episode is over, when we talk about communicating this uncertainty into our conversations with patients, and the implications for clinical practice. But first — it’s time for #AdamAnswers!
Okay. So this is the Hashtag ask Shani from our colleague Rahul Ganatra e says, I’d love to hear lessons learned from modern examples of being fooled by observational data. I was shocked when a resident was unfamiliar with the Women’s health initiative slash h r t story. And conversely, when observational data are better than randomized controlled trials, practical limitations aside. And then he says thanks cause he’s a very polite man.
Shani: 00:57:07 Well thank you, Rahul for the excellent question. Um, I think that the issue that you bring up around hormone replacement therapy is actually one of the things I teach on most often as a great example of not only when and how observational data get things wrong, but also it’s a great example of one of the criteria that we’ve been talking about today with respect to Bradford Hill, which is this criteria of specificity of the association. So let me tell you what I mean by that. So, um, basically, um, the story begins with one of the most famous studies of the last several decades, which is the nurses’ health study. It was an observational study, um, and when it was first published, uh, in 1985, it examine the association between hormone replacement therapy and heart disease, and showed that women who are taking estrogen had about a third of the risk of having a heart attack compared to women who had never taken it.
Shani: 00:58:00 Um, and this seems compelling enough that thousands of women had their physicians start prescribing them hormone replacement therapy. And then in 1998 so a little over 10 years later, the hers study comes along, which now is in randomized controlled trial, but it’s for secondary prevention in coronary heart disease. So using hormone replacement therapy in patients who have already had a cardiac event, does it prevent another one and they actually found no benefit. So this started to raise eyebrows but people still want it to believe and they said, well maybe it doesn’t help with secondary benefit. What about primary prevention in the context of an RCT? And so enter the women’s health initiative in 2003 and here they were studying primary prevention. So women who didn’t already have coronary heart disease and they randomized them to hormone replacement therapy or not. And in this case they again did not find any cardiac protection from a hormone replacement therapy.
Shani: 00:58:59 And in fact, they found that it may actually slightly increase the risk of coronary heart disease with a hazard ratio of 1.24 and confidence limits from 1.0 to 1.54. And so the question then became, became why did the observational study get it wrong? So now we have RCT data saying not only does this not help, but it may hurt what went wrong. And, um, there were some interesting clues about what had gone wrong that actually came out just a few years after the initial observational study. And what that data was that I’m to, which I’m referring is the Walnut Creek contraceptive drug study. And this was another large observational study, just like the nurses’ health study, where they again showed a benefit of hormone replacement therapy on these various outcomes related to coronary heart disease. However, they also showed a benefit of hormone replacement therapy on death from homicide, suicide, and motor vehicle accidents.
Shani: 00:59:59 And there is no possible way that hormone replacement therapy somehow keeps you from having motor vehicle accidents or committing suicide. And so, um, they basically said because there is no plausible association between Hormone Replacement Therapy and these other outcomes, it seems that something else is explaining the difference in the observed outcomes other than hormone replacement therapy. And we now know that’s something different to be something called a healthy user bias, which is that women who tended to be healthier and more health conscious took hormone replacement therapy and other women who didn’t, didn’t have that, and they ended up having differences in outcomes just by virtue of this healthy user bias. And so I think it’s a really nice illustration of when the observational study got it wrong and they’re randomized controlled trial. Got It. Right. And it served to elucidate a type of bias that actually plays a lot of observational studies.
Adam: 00:47:28 So one reason that I love talking to your shiny is that not only are you a brilliant researcher, you’re also a brilliant physician who takes care of the care of patients. And this is this. No, seriously, this is what’s so I mean this is part of the point that RA Fisher didn’t really understand the clinical context that things were happening. And as a practicing physician you are very well aware. How do we as doctors and how do we communicate this to our patients? Like how fragile some of the knowledge that we’re basing our decisions on, how do we communicate that
Shani: 00:47:57 it’s so hard. And I think you hit the nail on the head of the thing that often causes tension between physicians at between patients and the healthcare system is the realization on the part of the patient that we don’t always know what we’re doing. And this comes up all the time. I mean we, we make our best guess. And patients think, well how did you, how did you not know that this wouldn’t work? Or how did you know, you know, how did you not know that this was gonna happen? And they start realizing that so much of this is a gray zone. And that’s very understandably so, very disconcerting for patients. And so I don’t know necessarily what the answer is other than being up front with patients at the outset about what we do know and what we don’t know. And just helping them to understand that it takes a lot of testing and scientific data to even get to the point of finding an association, let alone a causal one. But how you, what words you use to describe that to a patient is really up to you. It’s hard.
Adam: 00:49:00 Yeah. And I think it’s challenging for physicians to know it’s, you have so much training and I you are capable of understanding all these things month many of these studies pass rate over my head. Like how are individual physicians supposed to navigate this.
Shani: 00:49:14 Yeah. So actually one of the things that um, my colleagues and I talk about or debate it a little bit about is whether or not individual physicians should actually even be attempting to interpret, um, studies in the medical literature. And so the counter argument is that they should not be doing that. They should just be using sources that actually have already gotten experts to appraise the data and make recommendations based on that data. And those experts should be people who have expertise in interpreting the results of studies. Um, so the average physician doesn’t have that expertise and the conclusions that they may draw, you know, they’re the ones who are actually at risk for over interpreting a p value at the exclusion of all the other information that can be gained from a study. And so I am actually supportive of that viewpoint that um, for the average physician you should be going to sites and platforms that aggregate that information that experts have culled through and formulated recommendations on as opposed to trying to make interpretations from individual studies on their own in the absence of background training in that, because it is a unique skillset.
Adam: 00:50:26 Do you, do you believe then in practice changing studies, is there such a thing as a practice changing study?
Shani: 00:50:32 Um, it’s funny that you ask that. As you know, I do a presentation every year called an update in hospital medicine where we’re expected to select the most potentially practice changing articles. And I’m always a little bit torn because honestly it probably shouldn’t be practice change. Most things probably shouldn’t be practice changing until multiple studies have demonstrated something and until it’s been worked into guidelines, because up until that point you’re just kind of going out on a limb and making an interpretation of a single study that you may or may not actually be understanding. And so I would always suggest waiting until experts in a field have agreed that this should be practice changing. Now the really practice changing studies do get incorporated into guidelines very quickly. So after the DAWN trial that I mentioned before came out almost immediately, there were new guidelines published that recommended going ahead with um, Athera, you know, interventional thrombectomy in patients with stroke up to 24 hours after the onset of their symptoms. And so there are practice changing studies I guess when I’m advocating for is not for individual physicians to make that decision on their own, wait until the field collectively has decided that this is a practice changing study to go ahead and change their practice.
Adam: 00:51:47 That’s interesting. I mean, part of the challenges, what about when a very high quality study comes into something where there has been very poor. So an example, say the, you know, the oviva trial in the New England Journal of Medicine, that’s the prosthetic joint infection, oral antibiotics and prosthetic joint infections. Um, there’s no good data on that. And then all of a sudden there was a randomized controlled trial, hey, this is not going to be, and I’m just curious how you take what you would take from that.
Shani: 00:52:10 Yeah, it’s a really good question. Um, there certainly are areas that are where it’s gray enough to know what the right thing to do is where if you did the thing that that study showed you couldn’t be accused of not, um, adhering to standard of care because there is no good standard of care. So there are some studies where it’s just the area is so great that no matter what you do, you’re going to be fine and you should just pick what you think is better and just do it. Um, then there are studies like the one you just mentioned or the other study looking at, um, oral antibiotics in the setting of Endocarditis, poet, poet. Yeah. And what to what the individual physician should do in those circumstances isn’t a hundred percent clear. But I would say that if you have a patient where you can’t for, for various reasons, they’re not going to go out on IV antibiotics, then you have now some substantiation for at least giving them oral antibiotics is going to be better than not treating them at all. So I think there are circumstances and contexts in which that you should incorporate results like that from individual studies. So I guess I’ve come full circle and I take back what I said as only you are capable of doing at him. Rodman is getting me to, to change my mind on things.
Adam: 00:53:30 Well, I, I think this is going to be a very unsurprising conclusion that I’ve drawn from all of this, but there’s not a right answer here, right? It’s, it’s murky modern medicine. We like to pretend that what we do as a clear cut
Shani: 00:53:42 science, do you know what it is Adam here, here’s what, I’m not going to take back what I said before. I’m going to modify it and modulate it, which is that because science is not precise. I think we need to recognize the imprecision in the science and what that means is sometimes doing something because you don’t really know what’s right and so you can do whatever you want and that circumstance, the converse of that is don’t tout w one study as the be all and end all of something. So I think that’s what I was kind of getting at with one study shouldn’t really be practice changing is like it’s, it’s a limited slice of evidence and it’s, most things are still pretty gray. And so making sure that you have kind of thought about it big picture wise and tea and put whatever study it is into the context of what has already been known is I think important. But in reality, so many of these things are gray that you’re probably justified and you know, doing whatever you think is, um, of the most benefit to the patient sitting in front of you at any given time.
Adam: 00:54:46 Being a doctor is hard, Huh?
Shani: 00:54:48 Oh my God, the nuances are never ending.