Episode 63: Signals


What does it mean when different physicians disagree about a diagnosis? I am joined by Dr. Shani Herzig as we explore this issue in the second part of my series on diagnosis. We’re going to discuss the advent of signal detection theory in the middle of the 20th century as new diagnostics such as laboratory testing and x-rays started to challenge the classical view of diagnosis. Along the way, we’re going to talk about focal infection theory and why it seems that everyone in older generations had their tonsils removed as children, early and very inefficient chest x-rays, British radar operators trying to figure out if they were looking at a flock of geese or a German bomber, and finally probably one of the most important people in medical diagnostics that you’ve never heard of — Jacob Yerushalmy.

 

If you want to purchase Bedside Rounds swag, the store is at https://www.teepublic.com/stores/bedsiderounds.

Transcript:

This is Adam Rodman, and you’re listening to Bedside Rounds, a monthly podcast on the weird, wonderful, and intensely human stories that have shaped modern medicine, brought to you in partnership with the American College of Physicians. This episode is called Signals, and it’s the second part on my series on the history of diagnosis. The first part was called Cry of the Suffering Organs, and it covered almost two hundred years, outlining the development of pathological anatomy and the “classical” view of diagnosis. In this episode I’m going to cover a far shorter time period in the 20th century — only a couple decades — when some of the underpinnings of diagnosis started to be questioned, and then redefined in terms of uncertainty, partially driven by advances in diagnostic technology, and partly by the maturing field of biostatistics and epidemiology. I realize that description sounds very dry, but I promise it’s going to be a fun journey — we’re going to talk about focal infection theory and why it seems that everyone in older generations had their tonsils removed as children, the absurd amount of time is used to take to take a chest x-ray, British radar operators trying to figure out if they were looking at a flock of geese or a German bomber, and finally probably one of the most important people in medical diagnostics that you’ve never heard of — Jacob Yerushalmy.

 

And because, like I always say on the show, I’m a humanities person, not a science person — and DEFINITELY not a math person, I am joined by friend of the show and literal smartest person I know, Dr. Shani Herzig, who is going to help me tell this story.

 

Shani introduction 

 

Just like in the first episode of this on-again/off-again series, I’m going to focus on two specific types of disease — first is chest disease, in particular tuberculosis and the second is tonsillar disease. I know I promised the 20th century, but in typical me fashion, I want to take a quick sojourn 2500 years before this. Last time I gave a model for what I called the “classical model of diagnosis” — first, gather the facts of the disease, second, recognize which of the facts are relevant (that is, recognize which findings are “signs”), and finally fit it to a specific diagnosis. Now, in pathological anatomy, this is all done with the knowledge that disease lives somewhere in the body — that the specific diagnosis lies under the skin just waiting to be found. One of the reasons that I’m fascinated by diagnosis — well, other than the fact that I’ve dedicated a good portion of my life to the art of diagnosis — is that it’s one of the most ancient forms of knowledge that we still use. The “classical” model has the pathological anatomy twist, for sure, but the basic epistemologic approach goes back to the ancients. So I want to give a concrete example, from my favorite ancient medical text — Epidemics II. I bring it up reasonably frequently, but your occasional reminder — Epidemics II is essentially the medical notebook of a traveling physician in the Aegean sea, and was used until the 18th century to teach medical students. I want to read for you a short case:

 

Case V. The woman affected with quinsy, who lodged in the house of Aristion: her complaint began in the tongue; speech inarticulate; tongue red and parched. On the first day, felt chilly, and afterwards became heated. On the third day, a rigor, acute fever; a reddish and hard swelling on both sides of the neck and chest, extremities cold and livid; and livid; respiration elevated; the drink returned by the nose; she could not swallow; alvine and urinary discharges suppressed. On the fourth, all of the symptoms were exacerbated. On the fifth she died of the quinsy.

 

I think it’s important we don’t get thrown off by the word the translator chose as “quinsy.” If you look in a medical dictionary, you’ll find that quinsy is “an old-fashioned term for a peritonsillar abscess,” which causes the classic “hot potato” voice, and in severe cases airway obstruction requiring drainage. This is, of course, a disease definition dependent on pathological anatomy. To earlier generations of physicians, quinsy had a different definition, which had considerable overlap with our current definition. You can get it from the etymology, which comes from Greek meaning “dog throttle,” or more accurately, to be choked like a dog being pulled in his collar. Quinsy referred to that same phenomenon — acute constriction of the airway. Imagine that you were Cabot, who started the Clinical Pathological Conference movement, seeing this patient in the 1920s. You’ve taken a history — initially swelling in the tongue and difficulty speaking (the hot potato voice), followed by rigors and swelling in the neck and chest, sepsis, what sounds like airway compromise, and finally signs of shock followed by death. You can imagine that on autopsy, you would find constricting abscesses around the tonsils, constricting the airway, and probably signs of systemic infection — hemorrhage and petechiae from DIC, signs of nephritis in the kidneys, sloughing in the gut. But here’s the thing — this ancient physician went through the same process of collecting signs, determining what was important, and fitting a diagnosis. And they determined: “It is probable that the cause of death on the sixth day was the suppression of the discharges.” This sounds absurd to us — the “suppression of discharges” is a symptom of how systemically ill she was, but to the ancients, working in a humoral nosology, the fact that she was no longer making stool or urine was a sign of how horrifically imbalanced her humors were. 

 

So now to the 20th century, but we’re going to stay on tonsils for a bit. Removal of the tonsils dates back to the ancients; Celsus describes removing “indurated tonsils”, either “enucleating” them with the fingers — and yes, that is as disturbing as it sounds — or using a mechanical device later called a “guillotine”. But it was an uncommon procedure until the 20th century, used as a last resort for cases like the unfortunate woman above. Even after the advent of modern surgery at the end of the 19th century, tonsillectomy was uncommon, not least because no one really knew what they did. “I think the very fact that we do not know the physiology of the tonsil ought to make us a little wary about doing needless operations,” reported a Kentucky surgeon. Even though physicians were increasingly recognizing “enlarged” tonsils, major medical and surgical texts urged caution. Tonsillectomy was only indicated when it “threatened the health of the child.”

 

Everything changed in the United States around 1910. I suspect that everyone listening has the sense that tonsillectomy was at one time a very common procedure; in fact, even today, patients of a certain age are incredibly likely to have had their tonsils removed. So what happened? Gerald Grob, the historian who I’m using for this, has two major theories. The first was the continued professionalization of surgery, including the colocation of surgery in the hospital, as opposed to taking place on your kitchen table. And of course, now that surgery was located in operating rooms, with more sophisticated tools, surgical nurses, and sterile technique, more complex surgical procedures were possible,  probably most famously Halsted’s radical mastectomy, which suddenly made breast cancer a surgical disease that could, in some cases, be cured. 

 

The second was the success of bacteriology, and the advent of a new explanatory model for many diseases — focal infection theory. I’ve talked about how the enthusiasm for germ theory led to hopes that ALL diseases would be shown to have infectious causes, most dramatically with Joseph Goldberger’s quest to show that pellagra was not caused by a microorganism, all the way back in episode 36. Focal infection theory was part and parcel of this. The general idea was that “foci” of infections lived in the human body, which could cause systemic illness. If you could eliminate the foci, you could prevent serious disease. Remember that at this point, the only effective antimicrobial was salvarsan, and that was only for syphilis. Preventing infection was really the only chance doctors had. But focal infectious theory went far beyond that — the increasing recognizance of cancer of the throat was explicitly linked to infectious focus in the tonsils; Frank Billings, an important medical education reformer, wrote, “After a

thorough curretment of the tonsillar fossae in cancerous patients, the malignant becomes a non-malignant mass, and the painful systemic symptoms gradually cease, and a healing process is soon noticeable.” The tonsils and adenoids became the most popular surgical target. Why? This article by Grob has a lovely quote from a pathologist skeptical of “infectious focus” (from a series looking for these in rheumatoid arthritis) that a so-called focus was just “anything that is readily accessible for surgery. Tonsillectomy quickly became a commonly performed procedure, done for any number of reasons — including colds and runny noses — and would stay that way until the 1960s. 

 

I’m not going to get too deep into the tonsillectomy debate, since that’s not what this episode is about, though Grob’s article is fascinating as it discusses the multitude of reasons that ineffective or even harmful therapeutics last. The general justification for the use of tonsillectomy was clinical experience — otolaryngologists largely reported that their patients improved after the tonsillectomy. But in the early 20th century, there was an incredibly active debate about what constituted evidence of efficacy, and many of the confounders we recognize today in retrospective trials were not yet appreciated.

 

But this body of literature would produce one very curious study — the American Child Health Association survey in 1934. The report is entitled “Physical Defects: The Pathway to Correction,” and the goal was to determine both the prevalence of uncorrected physical defects in New York City’s public school system, but also why they had been dropped from the “pathway to correction” — the city funded medical procedures to correct these at the time. It was a tremendous survey — over 25,000 patients were examined by teams of doctors to ensure accurate physical exams. And given the intense focus on tonsils during the time, it was no surprise that this topic would come up. 

 

For the study on tonsils, 1000 11 year-old children were selected. 611 of them had already had their tonsils removed. The researchers, who presumed the benefit of tonsillectomy, wanted to know — how many of the remaining 389 who had not had tonsillectomies would have benefited from the procedure? Those children were independently evaluated by one of the investigatory team. This “first opinion” recommended tonsillectomy for 174 of the 389, and for those of you not that great at doing long division of the fly, that’s 45%. But the panel put every child through multiple iterations of exams to make sure on the reproducibility of its findings. So the remaining 215 got a second opinion, and 99 more — that’s 46% — were again recommended for tonsillectomy. And the third reviewer repeated this process with the last 116, and recommended that 51 get a tonsillectomy — 44%!

 

After WW2, in elite pediatric circles, the mood towards tonsillectomy had begun to shift considerably — first because of penicillin and other antibiotics, but also because of data like the American Child Health Association Survey. Take, for example, this article called Pseudodoxia Pediatrica by Harry Bawkin published in NEJM in 1945. Today we like to talk about “Things We Do For No Reason,” TWDFNR, but in the 40s, that catchy phrase didn’t exist yet, This piece took its name from Thomas Browne’s Pseudodoxia Epidemica, which debunked common myths in the 17th century. Bawkin points out that despite pediatricians wanting to point a finger at financial conflicts of interest, in this particular study it didn’t work — all of the students’ tonsillectomies were paid for by the city. His conclusion was remarkably similar to the modern day TWDFNR — that procedures had not been checked by experimentation, and that academic pediatricians — many of whom had already given up on the procedure — showed a glaring indifference to the struggles of everyday practice. 

 

The focus was on the unnecessariness of the procedure. But in retrospect, two curious facts stood out. The first is that with each cycle, almost exactly 45% of children were selected for tonsillectomy, when presumably the “high risk” children had already been screened out. Almost 60 years later, Ayanian and Berwick revisited this study. With the insight of modern cognitive psychology, especially work on cognitive biases, they felt the most likely reason for the 45% was that the decisions each time reflected the physicians prior estimation of the probability of disease — anchoring bias. And in fact, they performed a series of experiments on pediatricians that showed the same bias still affecting the ability to estimate base rate. Anchoring bias is a common cognitive bias in the practice of medicine — probably one that happens every day. Effectively, you give more weight cognitively to the first piece of information that you get. In the case of the American Child Health Association Survey, they anchored on the “base rate”. But diagnostically it happens all the time. A patient, for example, might present with fevers, cough, with a large consolidation seen on the x-ray. You might reasonably assume they have a community-acquired pneumonia and start them on antibiotics. But as new information comes in — for example, no response to antibiotics, a high percentage of eosinophils, and a history of cigarette smoking — you have to be able to “debias” and weigh new information — for example, these factors pointing to an eosinophilic pneumonia. 

 

I felt pretty good about this explanation, so of course, I asked Shani., and she did not agree. At least not totally.

 

Shani (44:30):

I don’t, I’m not totally convinced by that explanation, to be honest,

Shani (44:41):

The, um, one of the other conclusions is inadequate expertise. I think that some of our, um, some of our diagnoses are inherently subjective and that’s kind of where the art comes in that I think that you could kind of go either way, which, which I guess does then tie back to the anchoring bias, um, that, you know, you, you, you have this sense that about half of the people that you see are gonna need this. And so you just kind of fall back on that. But I also think that there might be some, there there’s a spectrum of disease. And in any population that you evaluate, there are going to be those who have higher burden of disease or more apparent disease, and those that have less apparent disease. And I think you evaluate cases in terms of other cases that you’re comparing them to. And so I think you kind of like recalibrate, um, your, your threshold essentially to recommend surgery depending on the population that you’re, because the second comparing

Speaker 4 (45:47):

Tonsils are clearly less severe. So you recalibrate what it means.

Shani (45:51):

So I think, I think it’s actually some type of recalibration that’s going on relative to whatever population that you’re seeing. And we’re, we’re, it’s almost like as physicians we’re attempting to, or we’re, we’re primed for teasing out variation, right? Like, like how this patient differs from this patient. And if you have smaller and smaller degrees of variation, you’re almost going to then kind of like magnify the, even though the difference between patient and patient B in the selected population is not as big and magnitude as a difference between population and PA a and B in the first evaluation, it looks the same because relative to the other variation among the other patients in that selected group, it’s still relatively big. So almost like that’s what I was just going to say. It’s almost a distinction between relative and absolute.

But there’s another fascinating digression, largely skimmed over in the contemporaneous literature. Each of these doctors had, presumably, gone through the classical diagnostic process on each child, and yet in almost half the instances, another clinician had come to the opposite conclusion. The general conclusion among the tonsillectomy skeptical, like Bawkin, was that was just because the procedure was bunk. But that’s not a very satisfying answer. To me, it raises a deeper diagnostic question — what does it mean for the idea of diagnosis if two separate doctors can look at the same set of signs and come to opposite conclusions?

 

Diagnostic testing had only muddied these waters even further. Now, the first caveat — physicians had obviously been aware that diagnostic procedures could vary in their results for a long time. In fact, all of this dates back to really the first “blockbuster” diagnostic test — auscultation with the stethoscope. And yes — the physical exam is a diagnostic test, which is how it was first developed, and I strongly advocate that that is how it should be continued to be used, though many vociferously disagree with me. But laboratory diagnostics were a new beast altogether. They were initially fairly rudimentary, such as the hematokrit, which was basically a centrifuge for blood, and varied greatly depending of technique. But by the early 20th century, diagnostics rapidly improved and gave a level of precision and seeming replicability that was lacking for traditional physical diagnosis. And it was one of the first blockbuster tests, combined with its use in one of the first blockbuster pharmaceuticals, that led to some of the first statistical thinking about diagnosis. I have been working on some version of this piece for almost a year, which is why I was very pleasantly surprised when in March the Annals published a wonderful article by Binney, Hyde, and Bossuyt which explored the development of sensitivity and specificity specifically to immunology. I have the piece in the shownotes if you want to get more into the weeds.

 

So that blockbuster test was the Wassermann reaction. The Wassermann reaction is what we now know as a complement fixation reaction. Jules Bordet, at the Pasteur Institute, had found that there were actually two mysterious components in the serum that would later playfully be called “humoral” immunity — after my favorite, the four humors. One was stable when serum was heated up — antibodies. The other was sensitive to heat, which Bordet called “alexin” for the Greek to protect, but which would later be called complement. I won’t get too much into the details — it’s rather complicated, and I remember being forced to memorize this when I was a medical student. But in the first step, you prepare a mixture in which antibodies to syphilis will bind with an antigen (initially syphilitic liver from newborn infants, later cardiolipin), and in the second, you add blood. If the blood hemolyzes — that is, the individual red blood cells break apar and the tube turns a pinkish color — then the test is negative. If the substance does not change color, however, the test is positive. In Wassermann’s original tests, every known positive control of syphilis had turned positive — it appeared to be immunologically “specific” (a key word here) for the disease. It became clear, often in dramatic fashion, however, that a number of different conditions could cause a positive Wassermann reaction; and similarly, one could have syphilis with a negative Wassermann reaction (which was very important onced Salvarsan was introduced, since clearing a Wassermann was initially used as a proxy for cure, before it was clear this would lead to relapses). Similarly, the term “sensitivity” initially referred to the “delicacy” of a reaction — essentially how finicky it was to reproduce, and slowly began to pick up a secondary meaning of the ability of a serologic test to determine patients who truly had the disease. 

 

These terms — sensitivity and specificity — should sound familiar to anyone in medicine, and even at this early period started to refer to their modern concepts, the ability for a test to either correctly identify people with a disease, or people without a disease.  These concepts would have to wait a few more decades to jump from immunology into diagnostics. Because serologic tests were not the only game changing diagnostic test at the turn of the 20th century. There was, of course, the chest x-ray. X-rays, of course, were accidentally discovered by Wilhelm Roentgen in 1895, and within a few weeks of reporting his findings, the technology had spread across the world. But chest x-rays as we know it would take a considerable amount of time to catch on. 

 

I’m getting side tracked a lot, so I want to first have an aside about appropriate medical terminology and the term “chest x-ray”. So there was recently a  JAMA piece that really annoyed me, entitled “examples of improper terminology and more appropriate language.” I ranted plenty about it on Twitter, so I’ll focus purely on what I’m talking about in this episode. The authors feel that “chest x-ray” is a misused term, because, and I’m quoting here, “an x-ray is a beam of energy that we cannot see.” More appropriate language would be a chest radiograph.” This is silly for several different reasons. The first being, of course, that the phrase “chest x-ray” is perfectly clear and commonly used in this country. Do you get angry at me if, while driving, I ask my wife if it’s okay to change the station on the radio? Because, after all, you know that radio waves are actually beams of energy that we cannot see, and more appropriate language would be to say, “Do you mind if I change the station on the radio receiver?”

 

But the second order silliness is more fundamental, which is that the word “chest x-ray,” meaning an image of the chest taken with that “beam of energy we cannot see” actually predates the existence of chest radiographs! In the first of this series, I quoted from a CPC with Cabot in NEJM in 1923, and throughout they refer to a “chest x-ray” — by which they mean a photofluorogram (from a piece of film developed from a fluorography). 

 

As long as I have you here, let me complain about another piece of that article, which was a general broadside against neologisms. For example, doctors will often talk about troponinemia — troponins are protein complexes found in muscles that help them contract; we measure elevations of the cardiac version of these complexes to tell if someone has had a heart attack (in fact, if you want to talk about pedantry, troponin tests are often called “cardiac enzymes” despite the fact that a troponin is not an enzyme; it’s because the test used to be run on creatine kinase,which IS an enzyme, and you know doctors love tradition). In any event, the authors of this piece dislike the phrase troponinemia, because it’s a neologism composed of troponin and “heme” for blood, so elevated troponin levels in the blood, similar to “uremia”, meaning high levels of urea nitrogen in the blood. 

 

In my opinion, this is also incredibly silly, because at one point, every medical term was a neologism. Take the humble hematocrit. Can you imagine an 1880 version of this article? The authors recommend against using “hematokrit” because it is a neologism of heme, for blood, and “lactokrit” from the dairy industry. We prefer the term “ratio of the volume of red blood corpuscles to the total volume of liquor sanguinis.”

 

Sorry that I interrupted this episode for a rant. Fun fact — I routinely send my residents e-mails with historical literature reviews that I call #AngryAdamRants — my e-mail folder is approaching 80 of them currently. And yes, I was originally going to write a letter to the editor to JAMA, but I missed the deadline because of my son being born, but this is probably more effective anyway. 

 

Okay, so back to chest imaging. X-rays caught on across the globe, but chest imaging — or really, I should say, imaging of the lungs — took a lot longer. Howell has an interesting study that showed that at Pennsylvania Hospital, even by 1909, less than 10% of patients in the hospital got x-rayed, and even by 1925, only a quarter were x-rayed. The reasons were varied — but for chest x-rays in particular, it was mostly technical. Developing films took some time, so anything that moved — like the lungs and the heart — showed really minimal detail. Early 20th century x-rays were mostly used to examine broken bones, and understandably so, since the traditional way to test was incredibly painful manual manipulation. 

 

And when I say took some time, I’m not kidding. Francis Williams, a doctor at Boston City Hospital, was one of the first physicians in the United States to use x-rays in his clinical practice. Within weeks of hearing of Roentgen’s discovery, he borrowed Crooke’s tubes and hand-cranked Wimshurst static units to test his own patients. He would have patients lie chest down on a cot, the Crooke’s tube below them. It would take almost 45 minutes to make a chest radiograph — I have an example posted to Twitter, if you want to see. 

 

So how did Williams examine the chest? Fluoroscopy! Fluoroscopy is essentially a “live” x-ray — the patient is placed in between an x-ray source (the Crooke’s tube) and a fluorescent screen. The physician then looks through the screen to get a live view of the inside of the chest. Williams developed a portable 40 lb fluoroscopy unit that he would take to the wards — and to house calls! — to examine his patients. And get this — since photofluoroscopy didn’t exist yet, he actually drew his findings on paper, which you can still see represented in his textbook (which was the most famous English-language roentgenology text book). 

 

This was the era of the ascendance of pathological anatomy, and Williams was clearly strongly influenced by the tradition. He used fluoroscopy on all of his tuberculosis patients, and he correlated the findings he saw — darkening of the lung apices, restriction of the hemidiaphragms — with autopsies when the patients died. He actually took his fluoroscope up to the Lake Saranac Sanitorium, run by Trudeau — if you recall from episode 39 — and tracked progression of recovering TB patients.

 

He also turned his attention to pneumonia, finding, as basically every physician knows, that the x-ray is far more sensitive a tool than the stethoscope could ever be. He wrote, “A pneumonia in its early stages, or even through its whole course, may give no signs by auscultation or percussion, and the physician may find it difficult to make the diagnosis. In some of these cases a doubtful diagnosis may be made a more certain one with the use of the X-Rays”

 

Spearheaded by the sanitoria movement, chest imaging gradually improved, and by the 1930s it was routinely used both diagnostically, and for the importance of this story, for screening for tuberculosis. The most commonly used method was the photofluorogram, which I’ve already referenced; it was invented in 1936 by Manuel Dias de Abreu, essentially using a tiny (50-100 mm) photograph of the screen of a fluoroscope. The technique was cheap and quick, and massive campaigns started to screen for tuberculosis. Mobile vans were launched across the world, and in some countries, such as Brazil where the technique was invented, the majority of the population was screened. 

 

Which brings us to 1944, when the US Army, in the midst of WWII, wondered which method of screening for tuberculosis was most cost effective, when we enter the most influential biostatistician you’ve never heard of — Jacob, who went by Yak from his Hebrew name, Yerushalmy. Yerulshalmy was born in Ekron, outside of Jerusalem, in 1910, then in the Ottoman Empire but soon to be in British Mandate Palestine. But there were no universities in Palestine, so he emigrated to New York City, eventually finding himself at Hopkins completing a PhD in mathematics. During the Great Depression, there wasn’t a ton of demand for mathematicians and he made due as a poorly paid instructor — he joked that he was willing to beg in downtown Baltimore with a cardboard sign that read “Johns Hopkins PhD in mathematics”. In 1935, his career going nowhere and his finances teetering, he took a job as a biostatistician at the New York Department of Health, where he was involved in the first case-control study on tobacco and lung cancer in the United States, which I’ve talked about at length before. This interest led him to the NIH in 1938, where he started working on the First National Cancer Survey. The First National Cancer Survey was a precursor to the work on lung cancer and smoking that Bradford Hill and Doll would do in the UK, as well as Wynder and Graham in the US. The goal was to generally describe all “solid tumors” in the United States — that is, leukemia and other blood cancers were intentionally excluded. The team divided the country into three representative regions that included about 10% of the entire population of the country. The study looked only at healthcare facilities — hospitals and physicians’ offices — based on the assumption that these objective diagnoses would be far more reliable than talking to individual patients. In order to increase uptake, if an office or hospital didn’t respond to the request for data, a local medical student would be dispatched to get it in person. 

 

As the statistician on this project, Yerulshalmy had an enormous task — 63,555 cases of cancer were identified in the study regions, 4,144 from autopsy results. The Survey is probably less historically important for its findings than for some of the methods it pioneered — and one of the most important was setting a “pathological basis” for diagnoses of cancer. So a brief aside on cancer diagnostics in the first half of the 20th century — while frequently identified on autopsy, many diagnoses of cancer were “clinical” — that is, made from the patient’s presentation and imaging, usually x-rays. The reason for this was largely practical — there traditionally had been only limited therapy options, surgery, and only for limited types of cancer. But into the 1920s and 1930s, radiation therapy became more advanced, and in the 1940s, chemotherapy would be developed. This made it much more important to make an accurate diagnosis of cancer through biopsy, usually done under fluoroscopy. This method of diagnosis is largely what is still done today.

 

So when Yerlshalmy started to review and analyze all the data, looking specifically at pathological diagnoses of cancer, he made a very important observation — biopsy as a diagnostic tool had significant limitations. For example, a biopsy might miss, the tissue sampled might be nondiagnostic, there might be heterogeneity within the tumor, and the different pathologists reading the biopsy might disagree. In reality, there was likely a distribution of biopsy results from “definitely cancer, maybe cancer, unclear, likely not cancer, definitely not cancer”. The Survey had posited to classify all cancer in the United States, or at least all patients that presented to healthcare facilities. But Yerulshalmy realized that could likely never be the case, given the fundamental limitations of the diagnostic method.


His focus shifted quickly away from cancer, because very quickly the NIH’s priorities shifted after the US entered World War II, and Yerushalmy was transferred to the US Public Health Service Tuberculosis Control Bureau, where he was made Chief Statistician. One of the major jobs at this time was to protect American troops going overseas from tuberculosis. He quickly realized that TB diagnosis had the same fundamental problems as cancer diagnostics. Here the standard wasn’t a biopsy but a chest x-ray, interpreted by a physician. His personal views likely played a part here — a later biographer would describe him of being “contemptuous” of physicians and their “pseudo-”expertise. So when the Army tasked his team with testing out different methods to diagnose tuberculosis, he decided to put these nascent ideas to the test.

 

Yerushalmy was undoubtedly influenced by wartime statistical developments in radar detection. This is one of those fascinating caveats where I wish that I actually were a professional historian; the actual work on this subject, much of which happened at the Lincoln Laboratory at MIT here in Massachusetts, was kept classified during the war. I don’t have any access to any of those presumably now unclassified records, only the scholarship they produced after the war. So the exact order of things is a bit messy, and if you know more, please, reach out, because I’m incredibly curious. 

 

RADAR, which stands for radio detection and ranging, was developed in the lead up to WW2, and was especially advanced and contributed to the war efforts of the Allies. Essentially, by timing radio wave pulses with an oscilloscope, the distance from a radar detector to an object — such as a bomber or a fighter plane — could be determined. During the war, vast arrays of radar arrays were set up to detect incoming enemy aircraft.

 

But what did detecting an enemy aircraft actually look like? I think you’ve all seen this in movies. In a dark room somewhere, a radar receiver operator leaned over a cathode ray tube, watching an oscilloscope arc across the screen, updating blips whenever a wave bounced back. But with the technology of the time, the screen was covered in “snow” — either reflections from the atmosphere, or “ghosts” from inside the switches and circuits of the machine. This made it difficult to know if a given blip was a false positive, and actually just interference, or a false negative — that an enemy plane might be hidden behind the interference. This problem was especially big in the UK, which had built 182 miles of radar arrays constantly scanning the skies as part of their Chain Home to protect against the German Luftwaffe. An individual radar receiver operator had some tools at their disposal to try and determine whether or not there were false positives and negatives — they could adjust the gain and amplify or decrease the signal. But of course, this would also amplify or decrease all the noise in the field as well. Initially this was done ad hoc — the individual operator would decide whether or not to adjust the gain. But military statisticians — and again, I’m not entirely clear where, though the first work on this was published by Van Meter and Middleton at MIT in the early 1950s — realized that you could plot the false negative rate and the false positive rate against each other at different gain cutoffs and thereby calculate the most advantageous level to set the gain that would minimize false negatives and false positives. This curve was called the “receiver operator characteristic” curve, or the ROC curve. And this is probably quite obscure for most of my listeners, but ROC curves are still very important in medicine when evaluating new diagnostic tests — I’ll post some examples to Twitter since it’s hard to describe graphs on the radio — and if you were ever wondering why they had that name, you can thank the humble radar receiver operators in World War II. 

 

And with that, we are going to take a break for this episode until the second part comes out in a couple of weeks. Fun fact — when I originally finished the working draft of this episode– which was already spun off from the continuation of ANOTHER episode — it was almost two hours long. And this is the extremely edited version!

 

I have been working on this episode for, what, over a year now. And many of you noticed a significant delay in the release of episodes. The past few months have been very challenging — everything is okay now, but my family has been dealing with scary health problems. You would think that as a doctor, I’d be painfully aware of how easy it is to take your health for granted — and on an intellectual level, that’s certainly true. Unfortunately, I tell people bad news almost every week. But to intellectualize an experience is very different than going through it with your own child. Thank you all for bearing with me. And I don’t have any good advice from all of this, except make sure to hug your children.

 

And for some happier news, I have some official Bedside Rounds swag — and it is amazing! I commissioned the incredibly talented Sukriti Banthiya to make some humoral-theme artwork for me, and it is so cool beyond my wildest imagination. Like, seriously, it’s a deep dive of 7 years of Bedside Rounds episodes, organized by their humoral theme. And she even included me as Dr. Tulp! I bought a t-shirt with it on it, and yes, I do feel a little ridiculous walking around with myself dressed as a Renaissance physician carrying a bust of the Emperor Hadrian — but I also feel sorta awesome. If you want to get your own, I have a store set up at teepublic.com/stores/bedsiderounds, and I’ll put a link in the shownotes.

 

CME is available for this episode if you’re a member of the American College of Physicians at www.acponline.org/BedsideRounds. All of the episodes are online at www.bedsiderounds.org, or on Apple Podcasts, Spotify, Google Podcasts, or the podcast retrieval method of your choice. The facebook page is at /BedsideRounds. The show’s Twitter account is @BedsideRounds. If you want amazing Bedside Rounds swag designed by Sukriti Banthiya, the official merchandise stores is at www.teepublic.com/stores/BedsideRoundsI And I personally am @AdamRodmanMD on Twitter, where you can find me arguing about medical history and epistemology.

 

All of the sources are in the shownotes, and a transcript is available on the website.

 

And finally, while I am actually a doctor and I don’t just play one on the internet, this podcast is intended to be purely for entertainment and informational purposes, and should not be construed as medical advice. If you have any medical concerns, please see your primary care practitioner.