Mode
Text Size
Log in / Sign up

Systematic evaluation of harmonisation methods reduces variability in multisite MRI volumetric imaging-derived phenotypesNew Scan Fix Helps Compare Brain Changes Across Clinics

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Note that harmonisation benefits for multisite MRI studies are context dependent and insufficiently understood in small, heterogeneous datasets.

This systematic evaluation examines image-based and statistical harmonisation methods within a clinically realistic, multisite, multiscanner structural T1-weighted (T1w) MRI test-retest dataset. The scope covers variability in volumetric imaging-derived phenotypes (IDPs) and rank consistency under repeatability, intra-scanner, and inter-scanner reproducibility scenarios. No specific sample size was reported for this dataset.

Key synthesized findings show that harmonisation yields the lowest variability in repeatability scenarios, with median variability ranging from 0.6% to 2.7% and rank consistency (rho) greater than or equal to 0.9. In intra-scanner reproducibility scenarios, modest increases in variability were observed, ranging from 0.5% to 3.2% with rho values between 0.5 and 1.0. Conversely, inter-scanner reproducibility scenarios without harmonisation demonstrated substantially greater variability, ranging from 1.7% to 19.2%, with rho values between -0.1 and 0.9. Approaches modelling site as a batch and accounting for repeated-measure structure in pooled data showed greater consistency across IDPs and more accurate reflection of underlying biological variation.

The authors note that the effectiveness of harmonisation in small, heterogeneous clinical datasets remains insufficiently understood and that performance was strongly context dependent. Consequently, harmonisation cannot be treated as a one-size-fits-all solution. This information is important to consider for multisite study design, including sample size calculation in clinical trials.

Why Scan Differences Matter

For people with dementia, doctors need to see tiny changes over time. They track memory loss and brain shrinkage carefully. If the machine changes the picture, the doctor cannot tell if the brain is changing.

This makes it hard to test new medicines. Doctors need to know if a drug works. They need to see real progress in the brain.

The Old Way vs The New Way

Researchers used to hope these differences would cancel out. They often did not. This study tested new math tools to fix those differences.

But here is the twist. One tool does not fix everything. The best method depends on how the data was collected.

Think of it like translating a book into another language. If you translate poorly, the story changes. If you translate well, the meaning stays.

These tools adjust the numbers from the scan. They make them match up across different sites. It is like calibrating a scale to weigh the same object.

What The Study Tested

The team looked at many brain scans from different sites. They tested how well the scans matched when taken at the same time. They also checked scans taken at different times on different machines.

They focused on structural images. These show the shape and size of brain parts. This is key for aging and dementia research.

The differences were huge when using different scanners. Some variations were so large they looked like disease. The new math tools fixed most of those differences.

They kept the real brain changes while removing the machine noise. This makes the data much more reliable for doctors.

This doesn’t mean this treatment is available yet.

What Experts Say

Experts say this helps design better studies for new drugs. It ensures that if a drug works, the data shows it clearly.

It also helps save money on clinical trials. Researchers do not need as many people if the data is clean.

Patients do not need to do anything right now. This helps the science behind the treatments you might get later.

Your doctor might use this data in the future. It ensures they get the most accurate picture of your health.

The Catch

The study used limited data. It was not a full clinical trial. They tested specific scenarios to see what worked best.

Some methods worked well for one thing but not another. You cannot use a single fix for every situation.

The team noted that sample sizes were small. This means the results might change with more people. It is a strong start, but not the final word.

Real-world data is often messier than test data. Future studies need to handle that complexity.

More work is needed before this is standard. Researchers will test these tools in larger groups.

This helps ensure future treatments are safe and effective. We are moving closer to better care for brain health.

Study Details

EvidenceLevel 5
PublishedApr 2026
View Original Abstract ↓
Harmonisation is widely used to mitigate site- and scanner-related batch variability in multisite neuroimaging studies and is particularly critical in longitudinal clinical trials, where detection of subtle biological or treatment-related changes depends on reliable measurement across scanners and timepoints. However, the effectiveness of harmonisation in small, heterogeneous clinical datasets remains insufficiently understood, particularly in relation to subject-level variability and consistency across acquisition settings, and its impact on both removal of technical variability and preservation of biological variation in pooled multisite analyses. We systematically evaluated a range of image-based and statistical harmonisation methods using a clinically realistic multisite, multiscanner structural T1-weighted (T1w) MRI test-retest dataset comprising three controlled acquisition scenarios: repeatability, intra-scanner reproducibility and inter-scanner reproducibility. Methods were applied under different batch specifications (site, scanner, or both) and performance was assessed within each scenario and in pooled data using a multi-metric framework capturing both technical and biological variability in volumetric imaging-derived phenotypes (IDPs) relevant to aging and dementia research. Across IDPs, before harmonisation variability was lowest in the repeatability scenario (median variability=0.6 to 2.7%, rank consistency {rho} [≥]0.9), with modest increases under intra-scanner reproducibility (0.5 to 3.2%, {rho}=0.5 to 1.0) and substantially greater variability under inter-scanner reproducibility conditions (1.7 to 19.2%, {rho} =-0.1 to 0.9). These results offer important information to consider for multisite study design, including sample size calculation in clinical trials. Harmonisation performance was strongly context dependent, with clearer benefits emerged in inter-scanner scenarios where both variability reduction and improvements in subject-level consistency were observed. In pooled data, approaches that explicitly modelled site as batch and accounted for repeated-measure structure showed greater consistency across IDPs in batch effect mitigation and more accurately reflected underlying biological variation. Our evaluation metrics enabled disentangling the removal of global batch effect while highlighting residual variability at the phenotype-specific or multivariate levels. These findings demonstrate that harmonisation cannot be treated as a one-size-fits-all solution and must be interpreted relative to the acquisition context, dataset structure, and downstream analytic goals. Multi-metric evaluation under realistic clinical constraints is essential to support reliable and translatable neuroimaging inference by ensuring appropriate correction of batch effects while preserving longitudinal biological signals and sensitivity to clinically meaningful change in multisite studies.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.