Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets

An ongoing outbreak of coronavirus disease 2019 (COVID-19), caused by a single-stranded RNA virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a worldwide pandemic that continues to date. Vaccination has proven to be the most effective technique, by far, for the tr...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Syed Nisar Hussain Bukhari, Amit Jain, Ehtishamul Haq, Abolfazl Mehbodniya, Julian Webber
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/d1984a828cd54d1c86ea96782cdcf48f
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d1984a828cd54d1c86ea96782cdcf48f
record_format dspace
spelling oai:doaj.org-article:d1984a828cd54d1c86ea96782cdcf48f2021-11-25T17:20:33ZEnsemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets10.3390/diagnostics111119902075-4418https://doaj.org/article/d1984a828cd54d1c86ea96782cdcf48f2021-10-01T00:00:00Zhttps://www.mdpi.com/2075-4418/11/11/1990https://doaj.org/toc/2075-4418An ongoing outbreak of coronavirus disease 2019 (COVID-19), caused by a single-stranded RNA virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a worldwide pandemic that continues to date. Vaccination has proven to be the most effective technique, by far, for the treatment of COVID-19 and to combat the outbreak. Among all vaccine types, epitope-based peptide vaccines have received less attention and hold a large untapped potential for boosting vaccine safety and immunogenicity. Peptides used in such vaccine technology are chemically synthesized based on the amino acid sequences of antigenic proteins (T-cell epitopes) of the target pathogen. Using wet-lab experiments to identify antigenic proteins is very difficult, expensive, and time-consuming. We hereby propose an ensemble machine learning (ML) model for the prediction of T-cell epitopes (also known as immune relevant determinants or antigenic determinants) against SARS-CoV-2, utilizing physicochemical properties of amino acids. To train the model, we retrieved the experimentally determined SARS-CoV-2 T-cell epitopes from Immune Epitope Database and Analysis Resource (IEDB) repository. The model so developed achieved accuracy, AUC (Area under the ROC curve), Gini, specificity, sensitivity, F-score, and precision of 98.20%, 0.991, 0.994, 0.971, 0.982, 0.990, and 0.981, respectively, using a test set consisting of SARS-CoV-2 peptides (T-cell epitopes and non-epitopes) obtained from IEDB. The average accuracy of 97.98% was recorded in repeated 5-fold cross validation. Its comparison with 05 robust machine learning classifiers and existing T-cell epitope prediction techniques, such as NetMHC and CTLpred, suggest the proposed work as a better model. The predicted epitopes from the current model could possess a high probability to act as potential peptide vaccine candidates subjected to in vitro and in vivo scientific assessments. The model developed would help scientific community working in vaccine development save time to screen the active T-cell epitope candidates of SARS-CoV-2 against the inactive ones.Syed Nisar Hussain BukhariAmit JainEhtishamul HaqAbolfazl MehbodniyaJulian WebberMDPI AGarticleCOVID-19SARS-CoV-2T-cell epitopepeptide-based vaccinesmachine learningrandom forestMedicine (General)R5-920ENDiagnostics, Vol 11, Iss 1990, p 1990 (2021)
institution DOAJ
collection DOAJ
language EN
topic COVID-19
SARS-CoV-2
T-cell epitope
peptide-based vaccines
machine learning
random forest
Medicine (General)
R5-920
spellingShingle COVID-19
SARS-CoV-2
T-cell epitope
peptide-based vaccines
machine learning
random forest
Medicine (General)
R5-920
Syed Nisar Hussain Bukhari
Amit Jain
Ehtishamul Haq
Abolfazl Mehbodniya
Julian Webber
Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets
description An ongoing outbreak of coronavirus disease 2019 (COVID-19), caused by a single-stranded RNA virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has caused a worldwide pandemic that continues to date. Vaccination has proven to be the most effective technique, by far, for the treatment of COVID-19 and to combat the outbreak. Among all vaccine types, epitope-based peptide vaccines have received less attention and hold a large untapped potential for boosting vaccine safety and immunogenicity. Peptides used in such vaccine technology are chemically synthesized based on the amino acid sequences of antigenic proteins (T-cell epitopes) of the target pathogen. Using wet-lab experiments to identify antigenic proteins is very difficult, expensive, and time-consuming. We hereby propose an ensemble machine learning (ML) model for the prediction of T-cell epitopes (also known as immune relevant determinants or antigenic determinants) against SARS-CoV-2, utilizing physicochemical properties of amino acids. To train the model, we retrieved the experimentally determined SARS-CoV-2 T-cell epitopes from Immune Epitope Database and Analysis Resource (IEDB) repository. The model so developed achieved accuracy, AUC (Area under the ROC curve), Gini, specificity, sensitivity, F-score, and precision of 98.20%, 0.991, 0.994, 0.971, 0.982, 0.990, and 0.981, respectively, using a test set consisting of SARS-CoV-2 peptides (T-cell epitopes and non-epitopes) obtained from IEDB. The average accuracy of 97.98% was recorded in repeated 5-fold cross validation. Its comparison with 05 robust machine learning classifiers and existing T-cell epitope prediction techniques, such as NetMHC and CTLpred, suggest the proposed work as a better model. The predicted epitopes from the current model could possess a high probability to act as potential peptide vaccine candidates subjected to in vitro and in vivo scientific assessments. The model developed would help scientific community working in vaccine development save time to screen the active T-cell epitope candidates of SARS-CoV-2 against the inactive ones.
format article
author Syed Nisar Hussain Bukhari
Amit Jain
Ehtishamul Haq
Abolfazl Mehbodniya
Julian Webber
author_facet Syed Nisar Hussain Bukhari
Amit Jain
Ehtishamul Haq
Abolfazl Mehbodniya
Julian Webber
author_sort Syed Nisar Hussain Bukhari
title Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets
title_short Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets
title_full Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets
title_fullStr Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets
title_full_unstemmed Ensemble Machine Learning Model to Predict SARS-CoV-2 T-Cell Epitopes as Potential Vaccine Targets
title_sort ensemble machine learning model to predict sars-cov-2 t-cell epitopes as potential vaccine targets
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/d1984a828cd54d1c86ea96782cdcf48f
work_keys_str_mv AT syednisarhussainbukhari ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets
AT amitjain ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets
AT ehtishamulhaq ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets
AT abolfazlmehbodniya ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets
AT julianwebber ensemblemachinelearningmodeltopredictsarscov2tcellepitopesaspotentialvaccinetargets
_version_ 1718412497342431232