• The Limits of Chatbots in Clinical Decision‑Making
    May 7 2026

    Chatbots and large language models are becoming increasingly common in everyday life, but their growing presence in healthcare has raised an important question: Should probabilistic AI systems be used to help make medical decisions? This episode takes a clear, grounded look at why the answer is far more complicated—and potentially far more dangerous—than many people realize.

    Modern chatbots work by predicting the most statistically likely response based on patterns found in massive amounts of text. That makes them great for conversation, brainstorming, and general information, but not for something as complex and high‑stakes as medical diagnosis. In clinical settings, symptoms like persistent cough and chest pain can point to a wide range of possible conditions. A probabilistic model might default to the most common explanation, but medicine doesn’t work on majority statistics—it works on understanding nuance, context, risk, and rare but critical exceptions.

    This episode explores how relying on “most likely” answers can lead to missed diagnoses, delayed treatments, and dangerous oversights. You’ll hear how serious conditions such as pulmonary embolism or early lung cancer can present with the same symptoms as common respiratory infections, making a simplistic, probability‑driven guess both insufficient and unsafe. We also dive into the accuracy paradox—how an AI system can appear highly accurate while still being clinically untrustworthy, simply because it always chooses the dominant category.

    Beyond the risks, this episode highlights what real medical reasoning involves: integrating visual cues, patient history, audio signals, imaging studies, laboratory data, physiological waveforms, and much more. Human clinicians synthesize all these inputs at once, something a probabilistic chatbot was never designed to do. By understanding this difference, listeners will gain a deeper appreciation for the limitations of current AI tools and why responsible, deterministic models are essential in healthcare.

    Whether you’re a clinician, medical student, AI researcher, or simply curious about how technology intersects with patient care, this episode offers a clear and accessible exploration of why chatbots, despite their impressive capabilities, should not be mistaken for diagnostic tools.

    Show More Show Less
    8 mins
  • Viral AI-Beats-Doctors Study
    May 4 2026

    Another week, another headline declaring AI has officially surpassed physicians. This time, it's a study published in Science on April 30, 2026, claiming that OpenAI's o1 model "outperformed physician baselines" across multiple diagnostic reasoning tasks. The research comes from Harvard, Stanford, and Beth Israel Deaconess Medical Center. It's rigorous. It's peer-reviewed. And it's already being cited as proof that doctors are obsolete.

    But here's what those viral headlines won't tell you: the study tested AI on text alone.

    No images. No audio. No physical exams. No watching a patient walk through the door in distress before they utter a single word. No recognizing the subtle facial asymmetry that suggests stroke. No hearing the quality of a cough. No feeling a mass during examination. No interpreting the fear in a patient's eyes.

    In other words—not real medicine.

    In this episode, we unpack why this study, despite its methodological rigor, may be doing more harm than good. We explore the "headline-to-reality pipeline"—how clickbait economics strips away the authors' own caveats until all that remains is a misleading soundbite. We discuss the real-world consequences: misinformed patients with unrealistic expectations, demoralized clinicians, misallocated healthcare resources, and a generation of medical trainees learning exactly the wrong lessons about AI.

    Perhaps most critically, we address the "chatbot conflation problem." When the public hears "AI in medicine," they picture ChatGPT. But as of late 2025, over 850 AI-enabled medical devices have received FDA clearance—more than 70% related to medical imaging. These task-specific systems detecting pulmonary nodules, identifying intracranial hemorrhages, and flagging diabetic retinopathy are fundamentally different from large language models answering text prompts. Different architecture. Different validation. Different regulatory pathways. Different levels of evidence. Lumping them together under "AI" does a disservice to both.

    We also tackle a question the headlines never ask: What would a fair evaluation of AI in medicine actually look like? Hint—it would require multimodal inputs, messy real-world data, and a fundamentally different benchmark: not "Can AI beat doctors?" but "Do doctors WITH AI outperform doctors WITHOUT AI?"

    Finally, we make the case for why medical education must lead this conversation. If we don't teach our students—and frankly, the broader public—the critical distinctions between AI tools, what happens? Clinicians lose trust not just in overhyped chatbots, but in all medical AI, including the FDA-cleared tools actually saving lives. That erosion of trust could take a generation to repair.

    The technical findings of this study may be sound. But science doesn't exist in a vacuum. It exists in a media ecosystem that rewards sensationalism, in a healthcare system desperate for solutions, and in a culture increasingly willing to believe AI can do anything. The responsible approach is to be louder about limitations than findings.

    Because right now, we're celebrating an AI that aced a written exam—while the actual test, the messy, multimodal, deeply human reality of clinical medicine, remains completely ungraded.

    What You'll Learn: • Why text-based AI evaluations fundamentally misrepresent clinical medicine • The critical distinction between task-specific medical AI and general chatbots • How clickbait economics transforms nuanced research into dangerous misinformation • What fair AI evaluation in healthcare would actually require • Why medical educators must lead the conversation on AI literacy

    Resources Mentioned: • Brodeur PG, et al. "Performance of a large language model on the reasoning tasks of a physician." Science. 2026;392(6797):524-527 • FDA AI-Enabled Medical Device Database • Clinical AI Course (NYIT College of Osteopathic Medicine)

    Show More Show Less
    8 mins
  • Medical Education Must Teach AI Differently
    Apr 14 2026

    Artificial intelligence is rapidly moving into classrooms, clinics, and daily healthcare decision making, but much of the public conversation is built on a dangerous misunderstanding. Too often, people now treat artificial intelligence as if it simply means chatbots. In this episode, Dr. Milan Toma explains why that confusion matters and why healthcare professionals must learn to distinguish between conversational tools and task specific medical systems.

    This episode explores the long history of artificial intelligence in medicine, why chatbots are optimized for fluent language rather than true clinical understanding, and why strong performance on text based clinical vignettes should not be mistaken for real world diagnostic ability. Dr. Toma also examines the risks of artificial intelligence sycophancy, the danger of overfitting, the limits of accuracy as a metric, and how data leakage or hidden shortcuts can make weak systems look impressive during development.

    Most importantly, this is a conversation about education and patient safety. Healthcare professionals need more than basic exposure to artificial intelligence tools. They need to understand how different systems work, how they fail, how to evaluate claims critically, and why clinicians must work closely with developers before these tools are trusted in practice.

    The goal is not simply to teach people how to use artificial intelligence. It is to teach them how to question it, evaluate it, and apply it responsibly. The future of healthcare will include artificial intelligence, but safe healthcare depends on how well we teach people to understand it.

    Show More Show Less
    37 mins
  • The Overfitting Trap
    Apr 2 2026

    Introduction: A Tale of Two Rounds

    Every attending physician has seen the "Star Student" who can quote the New England Journal of Medicine verbatim but freezes when a patient doesn't follow the script. In this episode, we introduce Student A and Student B.

    • Student A (The Memorizer): They have a mental database of every practice vignette. They are fast, confident, and statistically "perfect" on paper.

    • Student B (The Thinker): They are slower. They visualize the blood flow, the cellular response, and the "why" behind the symptoms.

    We discuss why the current "Gold Rush" of Medical AI is accidentally scaling Student A to an industrial level, creating systems that look like geniuses in a lab but perform like novices in a clinic.

    In machine learning, overfitting is the statistical equivalent of "rote memorization." We break down the mechanics of how a model loses the forest for the trees.

    How do you "interview" an AI to see if it actually knows its stuff? You look at its Learning Curves. We explain how to read these graphs like a clinical EKG.

    • The Divergence Warning: When training accuracy rockets to 100% while validation accuracy (the "real world" test) plateaus or drops, you aren't looking at a breakthrough; you’re looking at a memory bank.

    • The Convergence Goal: A healthy model shows two lines that "hug" each other as they rise. This signifies that what the model learns in the "textbook" is actually applying to the "patient."

    Why do models overfit? Often, it’s because they found a shortcut. We explore the "Red Flags" that developers—and clinicians—need to watch for:

    1. Spurious Correlations: The model learns that "Patients with X-rays taken on a portable machine are sicker," rather than learning what is in the X-ray.

    2. Data Leakage: Including variables that already "hint" at the answer (e.g., predicting a condition using the medication used to treat it).

    3. Institutional Bias: Memorizing how one specific hospital operates rather than how a disease operates.

    We tackle the most dangerous metric in healthcare: Raw Accuracy. > "If 95% of your patients are healthy, a model can be 95% accurate by simply predicting 'Healthy' for every person it sees. It has a 0% success rate at finding disease, yet it gets a 95% grade. This isn't just bad math—it's dangerous medicine."

    We discuss why Sensitivity and Specificity are the only metrics that truly matter in a clinical setting.

    How do we build "Student B" AI? It requires a fundamental shift in development:

    • External Validation: Testing the model on data from a completely different hospital or geographic region.

    • Patient-Level Splits: Ensuring the model never sees the same patient in training and testing.

    • Clinician-in-the-Loop: Why doctors must be involved in feature selection to spot "leaky" data that a data scientist might miss.

    We wrap up the episode with a practical toolkit. Before you trust an AI system with your family, ask the developers these five questions:

    1. Was data split at the patient level? (Did you prevent the model from memorizing specific individuals?)

    2. Were leaky features identified and removed? (Is the model cheating using "proxy" data?)

    3. What do the training curves show? (Can I see the "EKG" of how this model learned?)

    4. How was class imbalance handled? (What is your Sensitivity for the actual disease cases?)

    5. Was there external validation? (Has this worked at a hospital that isn't yours?)

    Real medicine is messy. It’s atypical symptoms, patients with five comorbidities, and "unusual" presentations. If we want AI to be a partner in the clinic, we need it to be a "Student B." We need it to understand the pathophysiology of the data, not just the answers on the test.

    Join us as we move past the hype and toward a future of robust, reliable, and truly intelligent medical AI.

    Based on the work and research of Dr. Milan Toma and synthesized from over 40 peer-reviewed studies on clinical AI evaluation.

    Show More Show Less
    23 mins
  • Understanding the Trust Gap in Medical AI
    Mar 18 2026

    Have you ever wondered why skepticism about artificial intelligence persists in healthcare, even as new AI tools are rapidly introduced? In this episode, Dr. Milan Toma, Associate Professor of Clinical Sciences at NYIT College of Osteopathic Medicine, explains the roots of distrust in clinical AI systems and what it takes to regain confidence. Drawing on decades of machine learning evolution, real-world case studies, and his own research experience, Dr. Toma discusses the dangers of overfitting, the importance of healthy training dynamics, and the vital role of collaboration between clinicians and developers. Tune in to learn how the healthcare community can move from skepticism to trust and ensure that AI serves the needs of both patients and professionals.

    Show More Show Less
    9 mins
  • Algorithmic Shortcuts That Undermine Medical AI
    Mar 13 2026

    Imagine you are developing an AI system to predict which patients are at risk of becoming obese based on their lifestyle factors. You gather data on diet, exercise habits, sleep patterns, stress levels, and dozens of other variables. You train your model. It achieves 99% accuracy. You celebrate.Then someone points out that you included the patients' current weight in your dataset.Your model did not learn anything about lifestyle risk factors. It learned to calculate BMI. It took a shortcut. And that shortcut rendered your entire effort clinically useless.This is the problem of algorithmic shortcuts in medical AI, and it is flooding our research literature with impressive-looking results that will crumble the moment they encounter real patients.Machine learning models are optimization engines. They will find the easiest path to high accuracy, whether or not that path has any clinical meaning. When your training data contains features that essentially give away the answer, the model will exploit them ruthlessly.This is not a bug. It is exactly what the algorithm is designed to do. The problem is that we, the humans, failed to recognize that we handed the model an answer key along with the exam.Consider what happens when you include a "diabetes medication" column in a model designed to predict diabetes. The model quickly learns: if this column says "metformin," predict diabetes. It achieves near-perfect accuracy. But it has learned nothing useful. If you already know the patient is on diabetes medication, you do not need AI to tell you they have diabetes. You need AI to identify patients before they develop the condition, when intervention can still make a difference.This is the fundamental paradox: the features that make prediction easiest are often the features that make prediction pointless.

    Show More Show Less
    18 mins
  • The Accuracy Trap
    Mar 9 2026

    When a ninety nine percent accurate AI misses every single case of disease, something has gone terribly wrong.

    In this episode, Dr. Milan Toma exposes one of the most dangerous pitfalls in medical artificial intelligence: the accuracy paradox. Discover why impressive accuracy numbers can mask complete clinical failure, and why that four percent drop in accuracy might actually save lives.

    Dr. Toma explains how the fundamental nature of medical data, where the healthy are many and the sick are few, creates conditions where a system can achieve near perfect accuracy while detecting absolutely nothing. He walks through the math, the real world consequences, and the alternative metrics that actually matter for patient care.

    In this episode you will learn:Why a trivial classifier predicting everyone healthy achieves ninety nine percent accuracy while catching zero disease cases. How conditions like atrial fibrillation, breast cancer, and malignant arrhythmias create severely imbalanced datasets. The cascade of harm that unfolds when AI systems miss diagnoses, from false reassurance through disease progression to preventable patient harm. Why false negatives in medicine carry consequences far exceeding false positives. Which metrics, including sensitivity, specificity, F1 score, Matthews Correlation Coefficient, and balanced accuracy, reveal what accuracy hides. What clinicians, developers, and patients should demand from medical AI before trusting it with diagnosis.

    Presented by: Dr. Milan Toma, PhD, SMIEEE Associate Professor of Clinical Sciences College of Osteopathic Medicine New York Institute of Technology

    For deeper exploration: Diagnosing AI: Evaluation of AI in Clinical Practice (2026)

    Show More Show Less
    12 mins
  • A Clinical Guide to AI in Medical Diagnostics
    Nov 20 2025

    What can a 2017 colonoscopy study teach us about using AI diagnostics safely in 2025?


    An AI diagnostic tool boasts 99% accuracy. Should you trust it? In this episode, I explain why that number can be dangerously misleading and equip medical professionals with the practical strategies needed to see through the hype and protect their patients.As artificial intelligence becomes more integrated into healthcare, the ability to critically evaluate these tools is no longer optional; it's a core clinical skill. This session moves beyond the headlines to uncover the common, often hidden, flaws in AI training that can lead to inflated performance metrics and real-world risk. Learn how to become the essential human-in-the-loop who can distinguish a robust, reliable AI from a brittle and dangerous one.In this video, you will learn:The "Memorizing Student" Problem: A simple analogy to understand Overfitting, one of the most common ways AI models fail in the real world.How to Spot the Flaws: Practical techniques to diagnose unreliable AI, including how to interpret learning curves and why true external validation is the gold standard.The Danger of "Cherry-Picking": How selective reporting creates a false perception of reliability and why demanding transparency is crucial.The Colonoscopy Analogy: A powerful, real-world framework for how clinicians should approach AI results right now. Learn how to use a "positive" AI signal to your advantage and, more importantly, how to handle a "negative" signal to prevent catastrophic errors from automation bias.Your Ultimate Responsibility: Why the physician, not the algorithm, is always accountable and how to use AI as a tool for support, not an absolution of your clinical judgment.If you are a physician, medical student, resident, or healthcare administrator, this presentation provides the foundational knowledge you need to navigate the next wave of medical technology safely and effectively.

    Show More Show Less
    17 mins