This is a comparative review analyzing 49 otology-related questions posted on the Reddit forum r/AskDocs. The scope was to compare responses from three large language models (ChatGPT-4o, ClaudeAI, Google Gemini) against verified physician responses on outcomes of quality, empathy, and readability.
The authors synthesized that LLM responses were significantly longer than physician responses (145 vs 67 words, p < .05). LLM responses were also rated higher for quality (10.95 vs 9.58, p < .05), empathy (7.26 vs 5.18, p < .05), and readability (4.00 vs 3.73, p < .05). ChatGPT produced the most readable content (effect size 7.25), while ClaudeAI responses were more complex (effect size 11.86, p < .05). Evaluators correctly identified AI versus physician responses 89.4% of the time, with higher sensitivity for detecting physician responses (93.5%).
The review notes no reported safety data or limitations from the authors. The practice relevance suggests that, when appropriately implemented, such systems may enhance access to understandable otologic information and complement clinician-delivered care. The evidence is from a single comparative study on a specific online forum and may not generalize to clinical settings.
View Original Abstract ↓
Objective: The objective of this study is to assess the quality, empathy, and readability of large language model (LLM) responses regarding otologic questions from patients as they compare to verified physician responses in other patient-driven forums. This study aims to predict the potential utility of LLMs in patient-centered communication. Study Design: Comparative study Setting: Internet Methods: A sample of 49 otology-related questions posted on Reddit r/AskDocs1 between January 2020 and June 2025 were selected using search terms including ''hearing loss'', ''ear infection'', ''tinnitus'', ''ear pain'', and ''vertigo''. Posts were retrieved using Reddit's ''Top'' filter. Each question was answered by a verified doctor on Reddit and three AI LLMs (ChatGPT-4o, ClaudeAI, Google Gemini). Responses were scored by five evaluators. Results: Common otologic concerns posed in patient questions were otalgia (38.7%), vertigo (28.6%), tinnitus (24.5%), hearing loss (22.4%), and aural fullness (20.4%). LLM responses were longer than physician responses (mean 145 vs 67 words; p < .05) and rated higher in quality (10.95 vs 9.58), empathy (7.26 vs 5.18), and readability (4.00 vs 3.73); (all p < .05). Evaluators correctly identified AI versus physician responses in 89.4% of cases with higher sensitivity for detecting physician responses (93.5%). By Flesch-Kincaid grade level, ChatGPT produced the most readable content (mean 7.25), while ClaudeAI responses were more complex (11.86; p < .05). Conclusion: LLM responses received higher ratings in quality, empathy, and readability than those of physicians in response to a variety of otologic concerns. When appropriately implemented, such systems may enhance access to understandable otologic information and complement clinician-delivered care.