Review of LLM-generated synthetic Parkinson's patients for in silico trials

Photo by Brett Jordan / Unsplash

medRxiv Published April 29, 2026 Medically reviewed April 29, 2026 Study authors: Merlo Pich, E.; Pomponio, O.; Magno, M. A.; Berti, M.; Li Lu, L.; Coser, A.; Cani, I.; Calandra Boun… DOI ↗ By Dr. Ji-eun Park, MD · Brain, Mind & Pain

Key Takeaway

Consider this methodological work as foundational for digital twins, not as evidence of clinical utility.

This is a review and synthesis of a methodological proof-of-concept study. The scope is the generation of synthetic Parkinson's disease patients using the Qwen3-8B-Base model with a relational, tree-structured representation and domain-specific fine-tuning, applied in an in silico digital twin paradigm.

The authors synthesize findings on fidelity, which was assessed through distributional similarity, correlation structure, and neurologist review. Utility was tested by training diagnostic classifiers, reproducing a published pharmacometric disease progression model applied to in silico trials, and extracting a dopamine-motor impairment relationship at early PD stages. Privacy was evaluated via identical match share, distance-to-closest-record, and membership inference attacks.

The authors note that the study population was synthetic patients generated from the Parkinson's Progression Markers Initiative dataset. No sample size, effect sizes, p-values, or confidence intervals are reported. The work contributes to the foundations of digital twins for PD in silico trials.

Limitations acknowledged include the need for further validation of synthetic patients for real-world clinical trials and that the model performance may not represent all LLMs. Practice relevance is restrained to foundational methodological development.

Study Details

EvidenceLevel 5

PublishedApr 2026

View Original Abstract ↓

Heterogeneity in sporadic Parkinson's Disease (PD) is a critical problem that drives variable rates of progression and treatment response and complicates clinical trials. Access to large PD datasets that may help in clustering this heterogeneity is restricted by privacy and regulatory constraints. Simulated patients or digital twins may offer a solution. We developed a large language model (LLM)-framework to generate high-fidelity synthetic PD patients from the Parkinson's Progression Markers Initiative (PPMI) dataset based on the open-source Qwen3-8B-Base model. Using a relational, tree-structured representation and domain-specific fine-tuning, the model produces patient-level records with longitudinal clinical, imaging, and biomarker data. Fidelity was assessed through distributional similarity, correlation structure, and neurologist review. Utility was tested by training diagnostic classifiers, reproducing a published pharmacometric disease progression model applied to in silico trials, and by extracting a stringent dopamine-motor impairment relationship at early PD stages. Privacy was evaluated via identical match share, distance-to-closest-record, and membership inference attacks. These findings support the use of a dedicated LLM framework for patient simulation, contributing to the foundations of digital twins for PD in silico trials.

Review of LLM-generated synthetic Parkinson's patients for in silico trials

Study Details

Intravenous tenecteplase improves functional outcomes but increases hemorrhage risk in non-large vessel occlusion acute ischemic stroke.

Tenecteplase improves stroke recovery but increases bleeding risk in late treatment window

Clinical research that matters. Delivered to your inbox.

Review of LLM-generated synthetic Parkinson's patients for in silico trials

More on Parkinson's Disease

Study Details

Intravenous tenecteplase improves functional outcomes but increases hemorrhage risk in non-large vessel occlusion acute ischemic stroke.

Tenecteplase improves stroke recovery but increases bleeding risk in late treatment window

Clinical research that matters. Delivered to your inbox.

Related in Neurology

From Other Specialties