Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.

Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects thinking, memory, and behavior. Limbic-predominant age-related TDP-43 encephalopathy (LATE) is a recently identified common neurodegenerative disease that mimics the clinical symptoms of AD. The development of drugs t...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Xinxing Wu, Chong Peng, Peter T Nelson, Qiang Cheng
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2021
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/3e34440b4d514e9aaeb13cfe382ee3ec
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:3e34440b4d514e9aaeb13cfe382ee3ec
record_format	dspace
spelling	oai:doaj.org-article:3e34440b4d514e9aaeb13cfe382ee3ec2021-12-02T20:08:27ZRandom forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.1932-620310.1371/journal.pone.0256648https://doaj.org/article/3e34440b4d514e9aaeb13cfe382ee3ec2021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0256648https://doaj.org/toc/1932-6203Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects thinking, memory, and behavior. Limbic-predominant age-related TDP-43 encephalopathy (LATE) is a recently identified common neurodegenerative disease that mimics the clinical symptoms of AD. The development of drugs to prevent or treat these neurodegenerative diseases has been slow, partly because the genes associated with these diseases are incompletely understood. A notable hindrance from data analysis perspective is that, usually, the clinical samples for patients and controls are highly imbalanced, thus rendering it challenging to apply most existing machine learning algorithms to directly analyze such datasets. Meeting this data analysis challenge is critical, as more specific disease-associated gene identification may enable new insights into underlying disease-driving mechanisms and help find biomarkers and, in turn, improve prospects for effective treatment strategies. In order to detect disease-associated genes based on imbalanced transcriptome-wide data, we proposed an integrated multiple random forests (IMRF) algorithm. IMRF is effective in differentiating putative genes associated with subjects having LATE and/or AD from controls based on transcriptome-wide data, thereby enabling effective discrimination between these samples. Various forms of validations, such as cross-domain verification of our method over other datasets, improved and competitive classification performance by using identified genes, effectiveness of testing data with a classifier that is completely independent from decision trees and random forests, and relationships with prior AD and LATE studies on the genes linked to neurodegeneration, all testify to the effectiveness of IMRF in identifying genes with altered expression in LATE and/or AD. We conclude that IMRF, as an effective feature selection algorithm for imbalanced data, is promising to facilitate the development of new gene biomarkers as well as targets for effective strategies of disease prevention and treatment.Xinxing WuChong PengPeter T NelsonQiang ChengPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 9, p e0256648 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Xinxing Wu Chong Peng Peter T Nelson Qiang Cheng Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.
description	Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects thinking, memory, and behavior. Limbic-predominant age-related TDP-43 encephalopathy (LATE) is a recently identified common neurodegenerative disease that mimics the clinical symptoms of AD. The development of drugs to prevent or treat these neurodegenerative diseases has been slow, partly because the genes associated with these diseases are incompletely understood. A notable hindrance from data analysis perspective is that, usually, the clinical samples for patients and controls are highly imbalanced, thus rendering it challenging to apply most existing machine learning algorithms to directly analyze such datasets. Meeting this data analysis challenge is critical, as more specific disease-associated gene identification may enable new insights into underlying disease-driving mechanisms and help find biomarkers and, in turn, improve prospects for effective treatment strategies. In order to detect disease-associated genes based on imbalanced transcriptome-wide data, we proposed an integrated multiple random forests (IMRF) algorithm. IMRF is effective in differentiating putative genes associated with subjects having LATE and/or AD from controls based on transcriptome-wide data, thereby enabling effective discrimination between these samples. Various forms of validations, such as cross-domain verification of our method over other datasets, improved and competitive classification performance by using identified genes, effectiveness of testing data with a classifier that is completely independent from decision trees and random forests, and relationships with prior AD and LATE studies on the genes linked to neurodegeneration, all testify to the effectiveness of IMRF in identifying genes with altered expression in LATE and/or AD. We conclude that IMRF, as an effective feature selection algorithm for imbalanced data, is promising to facilitate the development of new gene biomarkers as well as targets for effective strategies of disease prevention and treatment.
format	article
author	Xinxing Wu Chong Peng Peter T Nelson Qiang Cheng
author_facet	Xinxing Wu Chong Peng Peter T Nelson Qiang Cheng
author_sort	Xinxing Wu
title	Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.
title_short	Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.
title_full	Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.
title_fullStr	Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.
title_full_unstemmed	Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.
title_sort	random forest-integrated analysis in ad and late brain transcriptome-wide data to identify disease-specific gene expression.
publisher	Public Library of Science (PLoS)
publishDate	2021
url	https://doaj.org/article/3e34440b4d514e9aaeb13cfe382ee3ec
work_keys_str_mv	AT xinxingwu randomforestintegratedanalysisinadandlatebraintranscriptomewidedatatoidentifydiseasespecificgeneexpression AT chongpeng randomforestintegratedanalysisinadandlatebraintranscriptomewidedatatoidentifydiseasespecificgeneexpression AT petertnelson randomforestintegratedanalysisinadandlatebraintranscriptomewidedatatoidentifydiseasespecificgeneexpression AT qiangcheng randomforestintegratedanalysisinadandlatebraintranscriptomewidedatatoidentifydiseasespecificgeneexpression
_version_	1718375175288782848

Random forest-integrated analysis in AD and LATE brain transcriptome-wide data to identify disease-specific gene expression.

Ejemplares similares