Random forest classification for predicting lifespan-extending chemical compounds
Abstract Ageing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learnin...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Nature Portfolio
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/4cb24d92d0de41d5869a47b418347c1d |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:4cb24d92d0de41d5869a47b418347c1d |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:4cb24d92d0de41d5869a47b418347c1d2021-12-02T15:22:57ZRandom forest classification for predicting lifespan-extending chemical compounds10.1038/s41598-021-93070-62045-2322https://doaj.org/article/4cb24d92d0de41d5869a47b418347c1d2021-07-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-93070-6https://doaj.org/toc/2045-2322Abstract Ageing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve score (AUC) of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (1) flavonoids, (2) fatty acids and conjugates, and (3) organooxygen compounds.Sofia KapsianiBrendan J. HowlinNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-13 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Sofia Kapsiani Brendan J. Howlin Random forest classification for predicting lifespan-extending chemical compounds |
description |
Abstract Ageing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve score (AUC) of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (1) flavonoids, (2) fatty acids and conjugates, and (3) organooxygen compounds. |
format |
article |
author |
Sofia Kapsiani Brendan J. Howlin |
author_facet |
Sofia Kapsiani Brendan J. Howlin |
author_sort |
Sofia Kapsiani |
title |
Random forest classification for predicting lifespan-extending chemical compounds |
title_short |
Random forest classification for predicting lifespan-extending chemical compounds |
title_full |
Random forest classification for predicting lifespan-extending chemical compounds |
title_fullStr |
Random forest classification for predicting lifespan-extending chemical compounds |
title_full_unstemmed |
Random forest classification for predicting lifespan-extending chemical compounds |
title_sort |
random forest classification for predicting lifespan-extending chemical compounds |
publisher |
Nature Portfolio |
publishDate |
2021 |
url |
https://doaj.org/article/4cb24d92d0de41d5869a47b418347c1d |
work_keys_str_mv |
AT sofiakapsiani randomforestclassificationforpredictinglifespanextendingchemicalcompounds AT brendanjhowlin randomforestclassificationforpredictinglifespanextendingchemicalcompounds |
_version_ |
1718387381501952000 |