Mode
Text Size
Log in / Sign up

Review finds fine-tuned AI model matches larger model in simulated lung cancer screening conversations

Review finds fine-tuned AI model matches larger model in simulated lung cancer screening conversatio…
Photo by Navy Medicine / Unsplash
Key Takeaway
Consider fine-tuned AI models for simulated screening, but note limitations in synthetic data.

This publication is an evaluation review focusing on the use of a fine-tuned open-source AI model (Google's Gemma 2 9B) for simulated lung cancer screening conversations. The model was trained on 5,086 synthetic conversations and tested against a larger frontier model (Google's Gemini 2.5 Flash) and an unmodified baseline, using 300 multi-turn conversations based on 100 patient personas. The scope includes assessing simulated patient experience, safety, and adaptation in clinical communication scenarios, with human clinician review ongoing.

Key findings from the review indicate that the fine-tuned model achieved a simulated patient experience score of 3.71 out of 5, slightly higher than the frontier model's 3.65 out of 5 (a difference of 0.06). It also showed zero boundary violations after clinician review of all flagged instances, and a Patient Adaptation Index of 0.37, compared to 0.35 for both the frontier model and unmodified baseline. The authors argue that targeted fine-tuning can yield communication quality comparable to larger proprietary systems, with potential advantages in safety-critical scenarios and suitability for data governance constraints like those in the NHS.

Limitations noted by the authors include that human clinician review of the conversations is ongoing, and the results are based on synthetic conversations and patient personas, which may affect generalizability. The review explicitly cautions against inferring clinical efficacy in real-world screening uptake from these simulations. Practice relevance is restrained, suggesting that while the approach shows promise for AI-assisted communication in settings with data governance needs, it remains preliminary and requires further validation in real clinical environments.

Study Details

EvidenceLevel 5
PublishedApr 2026
View Original Abstract ↓
Lung cancer screening saves lives, yet uptake remains suboptimal and inequitable. Personalised communication can improve attendance and reduce anxiety, but scaling such support is a workforce challenge. We fine-tuned Googles Gemma 2 9B using QLoRA on 5,086 synthetic screening conversations and compared it against Googles Gemini 2.5 Flash (a larger frontier model) and an unmodified baseline across 300 multi-turn conversations with 100 patient personas spanning ten clinical categories. Evaluation combined automated natural language processing metrics with independent language model judgement in two complementary modes: structured clinical rubric and simulated patient persona. The fine-tuned model achieved the highest simulated patient experience score (3.71/5 vs 3.65 for the frontier model), recorded zero boundary violations after clinician review of all flagged instances, and led on the four most safety-critical categories. A composite Patient Adaptation Index showed that the fine-tuned model led overall (0.37 vs 0.35 vs 0.35), with its clearest advantage on the two clinically specific components: empathy calibration to patient distress and selective smoking cessation signposting. These findings suggest that targeted fine-tuning of open-source models can yield clinical communication quality comparable to larger proprietary systems, with advantages in safety-critical scenarios and suitability for NHS data governance constraints. Human clinician review of these conversations is ongoing.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.