Machine Learning-Based Identification of Potentially Novel Non-Alcoholic Fatty Liver Disease Biomarkers

Non-alcoholic fatty liver disease (NAFLD) is a chronic liver disease that presents a great challenge for treatment and prevention.. This study aims to implement a machine learning approach that employs such datasets to identify potential biomarker targets. We developed a pipeline to identify potenti...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Roshan Shafiha, Basak Bahcivanci, Georgios V. Gkoutos, Animesh Acharjee
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/bea0a8a742a8492d9389289df0e22e20
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:bea0a8a742a8492d9389289df0e22e20
record_format dspace
spelling oai:doaj.org-article:bea0a8a742a8492d9389289df0e22e202021-11-25T16:49:54ZMachine Learning-Based Identification of Potentially Novel Non-Alcoholic Fatty Liver Disease Biomarkers10.3390/biomedicines91116362227-9059https://doaj.org/article/bea0a8a742a8492d9389289df0e22e202021-11-01T00:00:00Zhttps://www.mdpi.com/2227-9059/9/11/1636https://doaj.org/toc/2227-9059Non-alcoholic fatty liver disease (NAFLD) is a chronic liver disease that presents a great challenge for treatment and prevention.. This study aims to implement a machine learning approach that employs such datasets to identify potential biomarker targets. We developed a pipeline to identify potential biomarkers for NAFLD that includes five major processes, namely, a pre-processing step, a feature selection and a generation of a random forest model and, finally, a downstream feature analysis and a provision of a potential biological interpretation. The pre-processing step includes data normalising and variable extraction accompanied by appropriate annotations. A feature selection based on a differential gene expression analysis is then conducted to identify significant features and then employ them to generate a random forest model whose performance is assessed based on a receiver operating characteristic curve. Next, the features are subjected to a downstream analysis, such as univariate analysis, a pathway enrichment analysis, a network analysis and a generation of correlation plots, boxplots and heatmaps. Once the results are obtained, the biological interpretation and the literature validation is conducted over the identified features and results. We applied this pipeline to transcriptomics and lipidomic datasets and concluded that the C4BPA gene could play a role in the development of NAFLD. The activation of the complement pathway, due to the downregulation of the C4BPA gene, leads to an increase in triglyceride content, which might further render the lipid metabolism. This approach identified the C4BPA gene, an inhibitor of the complement pathway, as a potential biomarker for the development of NAFLD.Roshan ShafihaBasak BahcivanciGeorgios V. GkoutosAnimesh AcharjeeMDPI AGarticleNAFLDbiomarkermachine learningtranscriptomicslipidomicsBiology (General)QH301-705.5ENBiomedicines, Vol 9, Iss 1636, p 1636 (2021)
institution DOAJ
collection DOAJ
language EN
topic NAFLD
biomarker
machine learning
transcriptomics
lipidomics
Biology (General)
QH301-705.5
spellingShingle NAFLD
biomarker
machine learning
transcriptomics
lipidomics
Biology (General)
QH301-705.5
Roshan Shafiha
Basak Bahcivanci
Georgios V. Gkoutos
Animesh Acharjee
Machine Learning-Based Identification of Potentially Novel Non-Alcoholic Fatty Liver Disease Biomarkers
description Non-alcoholic fatty liver disease (NAFLD) is a chronic liver disease that presents a great challenge for treatment and prevention.. This study aims to implement a machine learning approach that employs such datasets to identify potential biomarker targets. We developed a pipeline to identify potential biomarkers for NAFLD that includes five major processes, namely, a pre-processing step, a feature selection and a generation of a random forest model and, finally, a downstream feature analysis and a provision of a potential biological interpretation. The pre-processing step includes data normalising and variable extraction accompanied by appropriate annotations. A feature selection based on a differential gene expression analysis is then conducted to identify significant features and then employ them to generate a random forest model whose performance is assessed based on a receiver operating characteristic curve. Next, the features are subjected to a downstream analysis, such as univariate analysis, a pathway enrichment analysis, a network analysis and a generation of correlation plots, boxplots and heatmaps. Once the results are obtained, the biological interpretation and the literature validation is conducted over the identified features and results. We applied this pipeline to transcriptomics and lipidomic datasets and concluded that the C4BPA gene could play a role in the development of NAFLD. The activation of the complement pathway, due to the downregulation of the C4BPA gene, leads to an increase in triglyceride content, which might further render the lipid metabolism. This approach identified the C4BPA gene, an inhibitor of the complement pathway, as a potential biomarker for the development of NAFLD.
format article
author Roshan Shafiha
Basak Bahcivanci
Georgios V. Gkoutos
Animesh Acharjee
author_facet Roshan Shafiha
Basak Bahcivanci
Georgios V. Gkoutos
Animesh Acharjee
author_sort Roshan Shafiha
title Machine Learning-Based Identification of Potentially Novel Non-Alcoholic Fatty Liver Disease Biomarkers
title_short Machine Learning-Based Identification of Potentially Novel Non-Alcoholic Fatty Liver Disease Biomarkers
title_full Machine Learning-Based Identification of Potentially Novel Non-Alcoholic Fatty Liver Disease Biomarkers
title_fullStr Machine Learning-Based Identification of Potentially Novel Non-Alcoholic Fatty Liver Disease Biomarkers
title_full_unstemmed Machine Learning-Based Identification of Potentially Novel Non-Alcoholic Fatty Liver Disease Biomarkers
title_sort machine learning-based identification of potentially novel non-alcoholic fatty liver disease biomarkers
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/bea0a8a742a8492d9389289df0e22e20
work_keys_str_mv AT roshanshafiha machinelearningbasedidentificationofpotentiallynovelnonalcoholicfattyliverdiseasebiomarkers
AT basakbahcivanci machinelearningbasedidentificationofpotentiallynovelnonalcoholicfattyliverdiseasebiomarkers
AT georgiosvgkoutos machinelearningbasedidentificationofpotentiallynovelnonalcoholicfattyliverdiseasebiomarkers
AT animeshacharjee machinelearningbasedidentificationofpotentiallynovelnonalcoholicfattyliverdiseasebiomarkers
_version_ 1718412919501225984