Review compares LLM and physician responses to otologic questions on RedditAI Answers Ear Questions With More Empathy Than Doctors

medRxiv Published June 4, 2026 Study authors: Akinniyi, S.; Jain-Poster, K.; Evangelista, E.; Yoshikawa, N.; Rivero, A. DOI ↗ Editorial oversight: Dr. Lars van Dijk, PhD · Surgical, Procedural & Diagnostic

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway

Consider that LLMs may provide longer, higher-rated responses to otologic questions, but clinical integration requires further validation.

This is a comparative review analyzing 49 otology-related questions posted on the Reddit forum r/AskDocs. The scope was to compare responses from three large language models (ChatGPT-4o, ClaudeAI, Google Gemini) against verified physician responses on outcomes of quality, empathy, and readability.

The authors synthesized that LLM responses were significantly longer than physician responses (145 vs 67 words, p < .05). LLM responses were also rated higher for quality (10.95 vs 9.58, p < .05), empathy (7.26 vs 5.18, p < .05), and readability (4.00 vs 3.73, p < .05). ChatGPT produced the most readable content (effect size 7.25), while ClaudeAI responses were more complex (effect size 11.86, p < .05). Evaluators correctly identified AI versus physician responses 89.4% of the time, with higher sensitivity for detecting physician responses (93.5%).

The review notes no reported safety data or limitations from the authors. The practice relevance suggests that, when appropriately implemented, such systems may enhance access to understandable otologic information and complement clinician-delivered care. The evidence is from a single comparative study on a specific online forum and may not generalize to clinical settings.

Ear problems are very common. Millions of people deal with hearing loss or dizziness every year. Doctors are often too busy to give long answers.

Patients often feel rushed during appointments. They leave with more questions than answers. This creates a gap in care that technology might fill.

The Surprising Shift

We used to trust only human doctors for advice. But new technology is changing how we get health information. This study asks if machines can be kinder.

Humans provide experience and judgment. Machines provide speed and data. The line between them is getting blurrier every day.

Think of AI like a super-fast library. It reads millions of books to find answers. Doctors rely on their own training and memory.

Both try to help, but in different ways. AI can scan medical records instantly. It does not get tired or stressed.

It uses patterns to guess what you need. This is why it might sound more polite. It does not feel the rush of a busy clinic.

Researchers looked at 49 real questions from an online forum. They compared answers from doctors against three popular AI tools. The questions covered pain, hearing, and balance issues.

The questions came from a public health board. Each one was answered by a verified doctor. Three different AI models also gave their own answers.

The AI wrote longer answers than the human doctors. They also scored higher on empathy and how easy to read they were. One AI tool was even easier to read than the others.

The average AI answer was 145 words long. The doctor answers were only 67 words on average. This means AI took more time to explain things.

ChatGPT was the most readable for patients. It used simple words that a child could understand. ClaudeAI was much harder to read for some people.

But there is a big warning.

Evaluators could tell which answers came from a computer. They guessed correctly almost nine times out of ten. This means the AI is not perfect yet.

Experts say this is a helpful tool, not a replacement. It can give patients better information quickly. But it should work alongside a real doctor.

The AI cannot examine your ear physically. It does not know your full medical history. It is a guide, not a clinician.

You should not use AI to diagnose yourself. It is still early days for this technology. Always talk to a medical professional for serious issues.

Use AI to learn more about your condition. Do not use it to decide on treatment. Your doctor knows your specific situation best.

This study was small and done online. It did not test if the advice was medically safe. We need more research before trusting it fully.

The AI might sound confident but be wrong. It can make up facts without knowing it. Safety is the most important part of care.

More trials will test if AI stays helpful over time. Approval for medical use takes a long time. Patients should wait for official guidelines before relying on them.

Technology moves fast, but safety moves slower. We need to ensure it helps without hurting. The future looks promising but requires caution.

Study Details

EvidenceLevel 5

PublishedApr 2026

View Original Abstract ↓

Objective: The objective of this study is to assess the quality, empathy, and readability of large language model (LLM) responses regarding otologic questions from patients as they compare to verified physician responses in other patient-driven forums. This study aims to predict the potential utility of LLMs in patient-centered communication. Study Design: Comparative study Setting: Internet Methods: A sample of 49 otology-related questions posted on Reddit r/AskDocs1 between January 2020 and June 2025 were selected using search terms including ''hearing loss'', ''ear infection'', ''tinnitus'', ''ear pain'', and ''vertigo''. Posts were retrieved using Reddit's ''Top'' filter. Each question was answered by a verified doctor on Reddit and three AI LLMs (ChatGPT-4o, ClaudeAI, Google Gemini). Responses were scored by five evaluators. Results: Common otologic concerns posed in patient questions were otalgia (38.7%), vertigo (28.6%), tinnitus (24.5%), hearing loss (22.4%), and aural fullness (20.4%). LLM responses were longer than physician responses (mean 145 vs 67 words; p < .05) and rated higher in quality (10.95 vs 9.58), empathy (7.26 vs 5.18), and readability (4.00 vs 3.73); (all p < .05). Evaluators correctly identified AI versus physician responses in 89.4% of cases with higher sensitivity for detecting physician responses (93.5%). By Flesch-Kincaid grade level, ChatGPT produced the most readable content (mean 7.25), while ClaudeAI responses were more complex (11.86; p < .05). Conclusion: LLM responses received higher ratings in quality, empathy, and readability than those of physicians in response to a variety of otologic concerns. When appropriately implemented, such systems may enhance access to understandable otologic information and complement clinician-delivered care.

Review compares LLM and physician responses to otologic questions on RedditAI Answers Ear Questions With More Empathy Than Doctors

The Surprising Shift

Study Details

Global GJB2 mutation prevalence in nonsyndromic hearing impairment

One in four people with hearing loss carries a specific gene mutation

Clinical research that matters. Delivered to your inbox.

Review compares LLM and physician responses to otologic questions on RedditAI Answers Ear Questions With More Empathy Than Doctors

The Surprising Shift

More on Hearing Loss

Study Details

Global GJB2 mutation prevalence in nonsyndromic hearing impairment

One in four people with hearing loss carries a specific gene mutation

Clinical research that matters. Delivered to your inbox.

Related in ENT (Otolaryngology)

From Other Specialties