Identification of stress response proteins through fusion of machine learning models and statistical paradigms

Abstract Proteins are a vital component of cells that perform physiological functions to ensure smooth operations of bodily functions. Identification of a protein's function involves a detailed understanding of the structure of proteins. Stress proteins are essential mediators of several respon...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ebraheem Alzahrani, Wajdi Alghamdi, Malik Zaka Ullah, Yaser Daanial Khan
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/1de75277602c4732bc46a4bb3a5045e8
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:1de75277602c4732bc46a4bb3a5045e8
record_format dspace
spelling oai:doaj.org-article:1de75277602c4732bc46a4bb3a5045e82021-11-08T10:52:05ZIdentification of stress response proteins through fusion of machine learning models and statistical paradigms10.1038/s41598-021-99083-52045-2322https://doaj.org/article/1de75277602c4732bc46a4bb3a5045e82021-11-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-99083-5https://doaj.org/toc/2045-2322Abstract Proteins are a vital component of cells that perform physiological functions to ensure smooth operations of bodily functions. Identification of a protein's function involves a detailed understanding of the structure of proteins. Stress proteins are essential mediators of several responses to cellular stress and are categorized based on their structural characteristics. These proteins are found to be conserved across many eukaryotic and prokaryotic linkages and demonstrate varied crucial functional activities inside a cell. The in-vivo, ex vivo, and in-vitro identification of stress proteins are a time-consuming and costly task. This study is aimed at the identification of stress protein sequences with the aid of mathematical modelling and machine learning methods to supplement the aforementioned wet lab methods. The model developed using Random Forest showed remarkable results with 91.1% accuracy while models based on neural network and support vector machine showed 87.7% and 47.0% accuracy, respectively. Based on evaluation results it was concluded that random-forest based classifier surpassed all other predictors and is suitable for use in practical applications for the identification of stress proteins. Live web server is available at http://biopred.org/stressprotiens , while the webserver code available is at https://github.com/abdullah5naveed/SRP_WebServer.gitEbraheem AlzahraniWajdi AlghamdiMalik Zaka UllahYaser Daanial KhanNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-15 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Ebraheem Alzahrani
Wajdi Alghamdi
Malik Zaka Ullah
Yaser Daanial Khan
Identification of stress response proteins through fusion of machine learning models and statistical paradigms
description Abstract Proteins are a vital component of cells that perform physiological functions to ensure smooth operations of bodily functions. Identification of a protein's function involves a detailed understanding of the structure of proteins. Stress proteins are essential mediators of several responses to cellular stress and are categorized based on their structural characteristics. These proteins are found to be conserved across many eukaryotic and prokaryotic linkages and demonstrate varied crucial functional activities inside a cell. The in-vivo, ex vivo, and in-vitro identification of stress proteins are a time-consuming and costly task. This study is aimed at the identification of stress protein sequences with the aid of mathematical modelling and machine learning methods to supplement the aforementioned wet lab methods. The model developed using Random Forest showed remarkable results with 91.1% accuracy while models based on neural network and support vector machine showed 87.7% and 47.0% accuracy, respectively. Based on evaluation results it was concluded that random-forest based classifier surpassed all other predictors and is suitable for use in practical applications for the identification of stress proteins. Live web server is available at http://biopred.org/stressprotiens , while the webserver code available is at https://github.com/abdullah5naveed/SRP_WebServer.git
format article
author Ebraheem Alzahrani
Wajdi Alghamdi
Malik Zaka Ullah
Yaser Daanial Khan
author_facet Ebraheem Alzahrani
Wajdi Alghamdi
Malik Zaka Ullah
Yaser Daanial Khan
author_sort Ebraheem Alzahrani
title Identification of stress response proteins through fusion of machine learning models and statistical paradigms
title_short Identification of stress response proteins through fusion of machine learning models and statistical paradigms
title_full Identification of stress response proteins through fusion of machine learning models and statistical paradigms
title_fullStr Identification of stress response proteins through fusion of machine learning models and statistical paradigms
title_full_unstemmed Identification of stress response proteins through fusion of machine learning models and statistical paradigms
title_sort identification of stress response proteins through fusion of machine learning models and statistical paradigms
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/1de75277602c4732bc46a4bb3a5045e8
work_keys_str_mv AT ebraheemalzahrani identificationofstressresponseproteinsthroughfusionofmachinelearningmodelsandstatisticalparadigms
AT wajdialghamdi identificationofstressresponseproteinsthroughfusionofmachinelearningmodelsandstatisticalparadigms
AT malikzakaullah identificationofstressresponseproteinsthroughfusionofmachinelearningmodelsandstatisticalparadigms
AT yaserdaanialkhan identificationofstressresponseproteinsthroughfusionofmachinelearningmodelsandstatisticalparadigms
_version_ 1718442501934678016