Random forest classification for predicting lifespan-extending chemical compounds

Abstract Ageing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learnin...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sofia Kapsiani, Brendan J. Howlin
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/4cb24d92d0de41d5869a47b418347c1d
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:4cb24d92d0de41d5869a47b418347c1d
record_format dspace
spelling oai:doaj.org-article:4cb24d92d0de41d5869a47b418347c1d2021-12-02T15:22:57ZRandom forest classification for predicting lifespan-extending chemical compounds10.1038/s41598-021-93070-62045-2322https://doaj.org/article/4cb24d92d0de41d5869a47b418347c1d2021-07-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-93070-6https://doaj.org/toc/2045-2322Abstract Ageing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve score (AUC) of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (1) flavonoids, (2) fatty acids and conjugates, and (3) organooxygen compounds.Sofia KapsianiBrendan J. HowlinNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-13 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Sofia Kapsiani
Brendan J. Howlin
Random forest classification for predicting lifespan-extending chemical compounds
description Abstract Ageing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve score (AUC) of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (1) flavonoids, (2) fatty acids and conjugates, and (3) organooxygen compounds.
format article
author Sofia Kapsiani
Brendan J. Howlin
author_facet Sofia Kapsiani
Brendan J. Howlin
author_sort Sofia Kapsiani
title Random forest classification for predicting lifespan-extending chemical compounds
title_short Random forest classification for predicting lifespan-extending chemical compounds
title_full Random forest classification for predicting lifespan-extending chemical compounds
title_fullStr Random forest classification for predicting lifespan-extending chemical compounds
title_full_unstemmed Random forest classification for predicting lifespan-extending chemical compounds
title_sort random forest classification for predicting lifespan-extending chemical compounds
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/4cb24d92d0de41d5869a47b418347c1d
work_keys_str_mv AT sofiakapsiani randomforestclassificationforpredictinglifespanextendingchemicalcompounds
AT brendanjhowlin randomforestclassificationforpredictinglifespanextendingchemicalcompounds
_version_ 1718387381501952000