This observational study combined transcriptomic and single-cell analysis with an artificial neural network to develop a gene signature distinguishing metabolic dysfunction-associated steatotic liver disease (MASL) from metabolic dysfunction-associated steatohepatitis (MASH). Data were derived from Gene Expression Omnibus datasets and clinical liver tissue samples. The training cohort included 149 MASL and 158 MASH samples; the validation cohort comprised 51 MASL and 155 MASH samples; clinical validation used 60 liver tissue samples.
In the validation cohort, the artificial neural network model achieved an area under the curve of 0.893 (95% CI 0.854 to 0.925) for distinguishing MASL from MASH. Differential expression analysis identified 656 differentially expressed genes, with five genes (MMP9, FABP5, TREM2, CTSD, UBD) upregulated in MASH and MAP2K1 downregulated. Immune infiltration analysis indicated increased monocytes, M0 and M1 macrophages, and activated dendritic cells in MASH.
Safety and adverse events were not reported, as this was a transcriptomic analysis. Key limitations include the observational design, a validation cohort limited in MASL sample size, and absence of long-term clinical outcomes. Clinical validation was restricted to quantitative real-time PCR in 60 samples.
These findings provide a molecular model for early discrimination between MASL and MASH, potentially aiding noninvasive diagnostic approaches. However, the gene signature is not validated for clinical use, and causality cannot be inferred from gene expression changes. Diagnostic accuracy should not be overstated beyond the reported AUC.
View Original Abstract ↓
BackgroundMetabolic dysfunction-associated steatotic liver disease (MASLD) is the most prevalent chronic liver disease, ranging from simple steatosis (MASL) to metabolic dysfunction-associated steatohepatitis (MASH). However, reliable noninvasive strategies for accurately distinguishing MASL from MASH at an early stage remain limited. We therefore aimed to develop a robust molecular model to improve early identification of disease progression and subtype discrimination.MethodsFive datasets from the Gene Expression Omnibus were integrated as a training cohort comprising 149 MASL and 158 MASH samples, while another dataset GSE135251 served as validation cohort including 51 MASL and 155 MASH samples. Differential expression analysis and weighted gene co expression network analysis were conducted to identify gene modules. Overlapping genes were subjected to protein interaction network construction and topological ranking. Least absolute shrinkage and selection operator regression, support vector machine recursive feature elimination, and random forest algorithms were jointly applied to derive robust diagnostic candidates. An artificial neural network classifier was established based on the final gene set and evaluated in both cohorts. Immune cell composition was estimated using CIBERSORT. Single cell RNA sequencing data from GSE136103 were analyzed to determine cell type specific expression patterns. Quantitative real time PCR validation was conducted in 60 clinical liver tissue samples.ResultsA total of 656 differentially expressed genes were identified between MASL and MASH. Network integration and machine learning intersection analysis consistently yielded six key genes: MMP9, FABP5, TREM2, CTSD, UBD, and MAP2K1. Five genes were upregulated in MASH, whereas MAP2K1 was downregulated. Individual genes demonstrated moderate diagnostic performance, with area under the curve values ranging from 0.692 to 0.822 in the training cohort. The artificial neural network model achieved an area under the curve of 0.893 (95% CI 0.854 to 0.925) in the validation cohort. Immune infiltration analysis revealed increased monocytes, M0 and M1 macrophages, and activated dendritic cells in MASH. Single cell analysis localized key genes predominantly to myeloid populations, and quantitative PCR confirmed consistent differential expression in clinical samples.ConclusionThis study establishes a multicohort machine learning-based gene signature with high diagnostic accuracy for distinguishing MASL from MASH and provides insight into immune metabolic mechanisms underlying disease progression.