Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II

Abstract Assessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled various successful applications of machine learning techniques to predict hERG inhibition. However, most of these researches...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Keiji Ogura, Tomohiro Sato, Hitomi Yuki, Teruki Honma
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2019
Materias:
R
Q
Acceso en línea:https://doaj.org/article/095e0068fd624717acf73f01c55aecca
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:095e0068fd624717acf73f01c55aecca
record_format dspace
spelling oai:doaj.org-article:095e0068fd624717acf73f01c55aecca2021-12-02T15:08:30ZSupport Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II10.1038/s41598-019-47536-32045-2322https://doaj.org/article/095e0068fd624717acf73f01c55aecca2019-08-01T00:00:00Zhttps://doi.org/10.1038/s41598-019-47536-3https://doaj.org/toc/2045-2322Abstract Assessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled various successful applications of machine learning techniques to predict hERG inhibition. However, most of these researches constructed the datasets from only one database, limiting the predictability and scope of the models. In this study, a hERG classification model was constructed using the largest dataset for hERG inhibition built by integrating multiple databases. The integrated dataset consisted of more than 291,000 structurally diverse compounds derived from ChEMBL, GOSTAR, PubChem, and hERGCentral. The prediction model was built by support vector machine (SVM) with descriptor selection based on Non-dominated Sorting Genetic Algorithm-II (NSGA-II) to optimize the descriptor set for maximum prediction performance with the minimal number of descriptors. The SVM classification model using 72 selected descriptors and ECFP_4 structural fingerprints recorded kappa statistics of 0.733 and accuracy of 0.984 for the test set, substantially outperforming the prediction performance of the current commercial applications for hERG prediction. Finally, the applicability domain of the prediction model was assessed based on the molecular similarity between the training set and test set compounds.Keiji OguraTomohiro SatoHitomi YukiTeruki HonmaNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 9, Iss 1, Pp 1-12 (2019)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Keiji Ogura
Tomohiro Sato
Hitomi Yuki
Teruki Honma
Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
description Abstract Assessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled various successful applications of machine learning techniques to predict hERG inhibition. However, most of these researches constructed the datasets from only one database, limiting the predictability and scope of the models. In this study, a hERG classification model was constructed using the largest dataset for hERG inhibition built by integrating multiple databases. The integrated dataset consisted of more than 291,000 structurally diverse compounds derived from ChEMBL, GOSTAR, PubChem, and hERGCentral. The prediction model was built by support vector machine (SVM) with descriptor selection based on Non-dominated Sorting Genetic Algorithm-II (NSGA-II) to optimize the descriptor set for maximum prediction performance with the minimal number of descriptors. The SVM classification model using 72 selected descriptors and ECFP_4 structural fingerprints recorded kappa statistics of 0.733 and accuracy of 0.984 for the test set, substantially outperforming the prediction performance of the current commercial applications for hERG prediction. Finally, the applicability domain of the prediction model was assessed based on the molecular similarity between the training set and test set compounds.
format article
author Keiji Ogura
Tomohiro Sato
Hitomi Yuki
Teruki Honma
author_facet Keiji Ogura
Tomohiro Sato
Hitomi Yuki
Teruki Honma
author_sort Keiji Ogura
title Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
title_short Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
title_full Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
title_fullStr Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
title_full_unstemmed Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
title_sort support vector machine model for herg inhibitory activities based on the integrated herg database using descriptor selection by nsga-ii
publisher Nature Portfolio
publishDate 2019
url https://doaj.org/article/095e0068fd624717acf73f01c55aecca
work_keys_str_mv AT keijiogura supportvectormachinemodelforherginhibitoryactivitiesbasedontheintegratedhergdatabaseusingdescriptorselectionbynsgaii
AT tomohirosato supportvectormachinemodelforherginhibitoryactivitiesbasedontheintegratedhergdatabaseusingdescriptorselectionbynsgaii
AT hitomiyuki supportvectormachinemodelforherginhibitoryactivitiesbasedontheintegratedhergdatabaseusingdescriptorselectionbynsgaii
AT terukihonma supportvectormachinemodelforherginhibitoryactivitiesbasedontheintegratedhergdatabaseusingdescriptorselectionbynsgaii
_version_ 1718388104163753984