Review finds fine-tuned AI model matches larger model in simulated lung cancer screening conversations
This publication is an evaluation review focusing on the use of a fine-tuned open-source AI model (Google's Gemma 2 9B) for simulated lung cancer screening conversations. The model was trained on 5,086 synthetic conversations and tested against a larger frontier model (Google's Gemini 2.5 Flash) and an unmodified baseline, using 300 multi-turn conversations based on 100 patient personas. The scope includes assessing simulated patient experience, safety, and adaptation in clinical communication scenarios, with human clinician review ongoing.
Key findings from the review indicate that the fine-tuned model achieved a simulated patient experience score of 3.71 out of 5, slightly higher than the frontier model's 3.65 out of 5 (a difference of 0.06). It also showed zero boundary violations after clinician review of all flagged instances, and a Patient Adaptation Index of 0.37, compared to 0.35 for both the frontier model and unmodified baseline. The authors argue that targeted fine-tuning can yield communication quality comparable to larger proprietary systems, with potential advantages in safety-critical scenarios and suitability for data governance constraints like those in the NHS.
Limitations noted by the authors include that human clinician review of the conversations is ongoing, and the results are based on synthetic conversations and patient personas, which may affect generalizability. The review explicitly cautions against inferring clinical efficacy in real-world screening uptake from these simulations. Practice relevance is restrained, suggesting that while the approach shows promise for AI-assisted communication in settings with data governance needs, it remains preliminary and requires further validation in real clinical environments.