CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods

Abstract Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogen...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Li Zhang, Haixin Ai, Wen Chen, Zimo Yin, Huan Hu, Junfeng Zhu, Jian Zhao, Qi Zhao, Hongsheng Liu
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2017
Materias:
R
Q
Acceso en línea:https://doaj.org/article/6f9317f3cd354de29a7d7ea285141a8b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:6f9317f3cd354de29a7d7ea285141a8b
record_format dspace
spelling oai:doaj.org-article:6f9317f3cd354de29a7d7ea285141a8b2021-12-02T11:52:25ZCarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods10.1038/s41598-017-02365-02045-2322https://doaj.org/article/6f9317f3cd354de29a7d7ea285141a8b2017-05-01T00:00:00Zhttps://doi.org/10.1038/s41598-017-02365-0https://doaj.org/toc/2045-2322Abstract Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).Li ZhangHaixin AiWen ChenZimo YinHuan HuJunfeng ZhuJian ZhaoQi ZhaoHongsheng LiuNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 7, Iss 1, Pp 1-14 (2017)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Li Zhang
Haixin Ai
Wen Chen
Zimo Yin
Huan Hu
Junfeng Zhu
Jian Zhao
Qi Zhao
Hongsheng Liu
CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods
description Abstract Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).
format article
author Li Zhang
Haixin Ai
Wen Chen
Zimo Yin
Huan Hu
Junfeng Zhu
Jian Zhao
Qi Zhao
Hongsheng Liu
author_facet Li Zhang
Haixin Ai
Wen Chen
Zimo Yin
Huan Hu
Junfeng Zhu
Jian Zhao
Qi Zhao
Hongsheng Liu
author_sort Li Zhang
title CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods
title_short CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods
title_full CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods
title_fullStr CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods
title_full_unstemmed CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods
title_sort carcinopred-el: novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods
publisher Nature Portfolio
publishDate 2017
url https://doaj.org/article/6f9317f3cd354de29a7d7ea285141a8b
work_keys_str_mv AT lizhang carcinopredelnovelmodelsforpredictingthecarcinogenicityofchemicalsusingmolecularfingerprintsandensemblelearningmethods
AT haixinai carcinopredelnovelmodelsforpredictingthecarcinogenicityofchemicalsusingmolecularfingerprintsandensemblelearningmethods
AT wenchen carcinopredelnovelmodelsforpredictingthecarcinogenicityofchemicalsusingmolecularfingerprintsandensemblelearningmethods
AT zimoyin carcinopredelnovelmodelsforpredictingthecarcinogenicityofchemicalsusingmolecularfingerprintsandensemblelearningmethods
AT huanhu carcinopredelnovelmodelsforpredictingthecarcinogenicityofchemicalsusingmolecularfingerprintsandensemblelearningmethods
AT junfengzhu carcinopredelnovelmodelsforpredictingthecarcinogenicityofchemicalsusingmolecularfingerprintsandensemblelearningmethods
AT jianzhao carcinopredelnovelmodelsforpredictingthecarcinogenicityofchemicalsusingmolecularfingerprintsandensemblelearningmethods
AT qizhao carcinopredelnovelmodelsforpredictingthecarcinogenicityofchemicalsusingmolecularfingerprintsandensemblelearningmethods
AT hongshengliu carcinopredelnovelmodelsforpredictingthecarcinogenicityofchemicalsusingmolecularfingerprintsandensemblelearningmethods
_version_ 1718395051778768896