Mode
Text Size
Log in / Sign up

Human-written medical histories rated higher than ChatGPT in paediatric orthopaedics

Human-written medical histories rated higher than ChatGPT in paediatric orthopaedics
Photo by Andrew Neel / Unsplash
Key Takeaway
Consider that human-written medical histories remain superior to ChatGPT-generated ones in paediatric orthopaedics, especially for accuracy and consistency.

This randomized, blinded comparative study evaluated the quality of medical history summaries created by humans versus ChatGPT in a paediatric orthopaedic practice. Twenty consecutive paediatric patients (mean age 14.2 ± 2.3 years; 11 males, 9 females) were included. Each patient had two summaries: one human-created and one ChatGPT-generated, both based on the same clinical encounter.

Primary outcome was overall rating on a 6-point Likert scale. Human summaries scored significantly higher (5.2 ± 0.8) than ChatGPT summaries (4.5 ± 0.8), with a large effect size (Cohen's d = 0.80, p < 0.001). Secondary outcomes including temporal consistency, spatial consistency, accident description, and overall impression also significantly favored human documentation (all p < 0.001). For example, ChatGPT had errors in accident description in 21 of 60 evaluations (35%) and in temporal consistency in 14 of 60 evaluations (23%). No significant differences were found for writing style or documentation of previous interventions.

Safety and tolerability were not reported. A key limitation was moderate inter-rater reliability (ICC = 0.64). The study suggests that current large language models are not ready to replace human medical documentation in paediatric orthopaedic practice without careful oversight, supporting hybrid workflows where AI assists but does not replace human clinical judgement.

Study Details

Study typeRct
EvidenceLevel 2
Follow-up27.6 mo
PublishedMay 2026
View Original Abstract ↓
PURPOSE: This study evaluated the quality of ChatGPT-generated medical history summaries compared to human-created documentation in a paediatric orthopaedic practice setting. METHODS: A prospective, randomised, blinded comparative study was conducted involving 20 consecutive paediatric patients (mean age 14.2 ± 2.3 years; 11 males, 9 females) presenting with knee problems. Audio recordings of medical consultations were transcribed and processed by ChatGPT-4o (OpenAI) using standardised prompts. Three independent orthopaedic specialists evaluated both human-generated and AI-generated summaries using eight quality criteria: temporal consistency, spatial consistency, accident description, symptom accuracy, symptom specificity, previous interventions, writing style and overall impression. Each criterion was scored on a 6-point Likert scale. RESULTS: Human-created summaries received significantly higher overall ratings (5.2 ± 0.8) compared to ChatGPT-generated summaries (4.5 ± 0.8, p < 0.001, Cohen's d = 0.80). After Bonferroni correction for multiple comparisons, statistically significant differences favouring human documentation were confirmed in four of eight criteria: temporal consistency (p < 0.001), spatial consistency (p < 0.001), accident description (p < 0.001) and overall impression (p < 0.001). No significant differences were observed for writing style and documentation of previous interventions. Inter-rater reliability was moderate (ICC = 0.64). ChatGPT demonstrated frequent temporal inconsistencies (14 of 60 evaluations, 23%) and omission of relevant accident details (21 of 60 evaluations, 35%). CONCLUSION: While AI-generated summaries showed acceptable stylistic quality, human documentation significantly outperformed ChatGPT in critical clinical dimensions, including temporal consistency and accuracy of complex orthopaedic presentations. Current large language models are not ready to replace human medical documentation in paediatric orthopaedic practice without careful oversight. The findings support the implementation of hybrid workflows where AI assists but does not replace human clinical judgement. LEVEL OF EVIDENCE: Level I.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.