Mode
Text Size
Log in / Sign up

Systematic review and meta-analysis evaluates ML and DL models for diagnosing MASH and fibrosisAI models help doctors spot liver disease with high accuracy

AI-generated summary of the cited source, checked by automated accuracy review. How we work

Key Takeaway
Consider AI models as promising but not yet validated for routine MASH or fibrosis diagnosis.

This systematic review and meta-analysis included 106 studies, with 35 providing data for analysis (28 for ML, 7 for DL). The review evaluated ML and DL models for diagnosing MASH and liver fibrosis in patients with metabolic dysfunction-associated steatotic liver disease. The primary outcome was pooled area under the receiver operator characteristic curve (AUROC).

For diagnosing MASH, ML models showed a pooled AUROC of 0.833 (95% CI: 0.806-0.860), while DL models had a pooled AUROC of 0.841 (95% CI: 0.782-0.900). The best-performing ML model (LightGBM) achieved an AUROC of 0.920 (95% CI: 0.916-0.924), and the best-performing DL model (ResNet50) achieved an AUROC of 0.960 (95% CI: 0.951-0.969). For diagnosing fibrosis, ML models had a pooled AUROC of 0.826 (95% CI: 0.792-0.860), with CatBoost achieving the highest AUROC of 0.960 (95% CI: 0.950-0.970). DL models for fibrosis had a pooled AUROC of 0.875 (95% CI: 0.816-0.934).

The authors did not report specific limitations or safety data. The review highlights the potential of AI-driven approaches in MASLD management, but the absence of reported limitations and the variability in model performance warrant cautious interpretation. Further validation in diverse clinical settings is needed before widespread adoption.

Doctors often struggle to see early signs of fatty liver disease before it becomes serious. A new look at computer tools shows they might help solve this problem. Researchers combined results from 106 different studies to test how well these programs work.

The analysis focused on two types of smart software. Machine learning models found a score of 0.833 for spotting fatty liver disease. Deep learning models did even better with a score of 0.841. These numbers measure how often the computer gets the diagnosis right compared to a human expert.

The tools also worked well for finding liver scarring. One specific machine learning program called CatBoost reached a perfect score of 0.960. Another deep learning program called ResNet50 reached 0.960 as well. These results suggest the technology is ready for real-world use.

This review did not report any safety issues because these are computer programs, not drugs. The findings come from a large group of studies, which makes the results more reliable. While more research is always good, these tools offer a clear path forward for better liver care.

What this means for you:
Smart computer models can accurately diagnose fatty liver disease and scarring.

Study Details

Study typeMeta analysis
EvidenceLevel 1
PublishedMay 2026
View Original Abstract ↓
INTRODUCTION: Metabolic dysfunction-associated steatotic liver disease (MASLD) can progress to metabolic dysfunction-associated steatohepatitis (MASH) and liver fibrosis, contributing to a heavier global health burden. Non-invasive diagnostic tools developed using machine learning (ML) and deep learning (DL), two representative artificial intelligence algorithms, are increasingly being explored for MASH and its related fibrosis assessment. OBJECTIVES: This study aimed to compare the diagnostic performance of different ML and DL models and identify the top-performing models for diagnosing MASH and associated liver fibrosis. METHODS: A systematic review and meta-analysis were conducted across PubMed, Web of Science, Embase and Cochrane Library from inception to May 18, 2025. Pooled area under the receiver operator characteristic curve (AUROC) values with 95 % confidence interval (CI) were calculated. Accuracy, specificity, sensitivity, positive predictive values, and negative predictive values were also recorded. RESULTS: Of 4,314 studies initially identified, 106 met the inclusion criteria, with 35 studies (ML: n = 28; DL: n = 7) providing data for analysis. Logistic Regression and Neural Network are the most commonly algorithms applied in ML and DL, respectively. The pooled AUROCs for diagnosing MASH were 0.833 (95 %CI: 0.806-0.860) for ML models and 0.841 (95 %CI: 0.782-0.900) for DL models. Light Gradient Boosting Machine (LightGBM) and ResNet50 were the best-performing models for diagnosing MASH within ML and DL algorithms, respectively, achieving corresponding AUROCs of 0.920 (95 %CI: 0.916-0.924) and 0.960 (95 %CI: 0.951-0.969). For fibrosis diagnosis, ML models had a pooled AUROC of 0.826 (95 %CI: 0.792-0.860), with Categorical Boosting (CatBoost) achieving the highest AUROC of 0.960 (95 %CI: 0.950-0.970). DL models yielded the pooled AUROC of 0.875 (95 %CI: 0.816-0.934) for fibrosis diagnosis. CONCLUSIONS: Both ML and DL models demonstrated strong diagnostic performance for MASH and liver fibrosis, with DL achieving marginally higher AUROCs. AI-driven approaches show promise in MASLD management.
Free Newsletter

Clinical research that matters. Delivered to your inbox.

Join thousands of clinicians and researchers. No spam, unsubscribe anytime.