Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network

Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we p...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Rahu Sikander, Yuping Wang, Ali Ghulam, Xianjuan Wu
Formato: article
Lenguaje:EN
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://doaj.org/article/f6758d809764473f90fb9a753313bc26
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f6758d809764473f90fb9a753313bc26
record_format dspace
spelling oai:doaj.org-article:f6758d809764473f90fb9a753313bc262021-12-01T19:44:45ZIdentification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network1664-802110.3389/fgene.2021.759384https://doaj.org/article/f6758d809764473f90fb9a753313bc262021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fgene.2021.759384/fullhttps://doaj.org/toc/1664-8021Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we propose a novel approach for predicting the amino acid sequences of enzymes and non-enzymes via Convolutional Neural Network (CNN). In CNN, the roles of enzymes are predicted from multiple sides of biological information, including information on sequences and structures. We propose the use of two-dimensional data via 2DCNN to predict the proteins of enzymes and non-enzymes by using the same fivefold cross-validation function. We also use an independent dataset to test the performance of our model, and the results demonstrate that we are able to solve the overfitting problem. We used the CNN model proposed herein to demonstrate the superiority of our model for classifying an entire set of filters, such as 32, 64, and 128 parameters, with the fivefold validation test set as the independent classification. Via the Dipeptide Deviation from Expected Mean (DDE) matrix, mutation information is extracted from amino acid sequences and structural information with the distance and angle of amino acids is conveyed. The derived feature maps are then encoded in DDE exploitation. The independent datasets are then compared with other two methods, namely, GRU and XGBOOST. All analyses were conducted using 32, 64 and 128 filters on our proposed CNN method. The cross-validation datasets achieved an accuracy score of 0.8762%, whereas the accuracy of independent datasets was 0.7621%. Additional variables were derived on the basis of ROC AUC with fivefold cross-validation was achieved score is 0.95%. The performance of our model and that of other models in terms of sensitivity (0.9028%) and specificity (0.8497%) was compared. The overall accuracy of our model was 0.9133% compared with 0.8310% for the other model.Rahu SikanderYuping WangAli GhulamXianjuan WuFrontiers Media S.A.articleenzymefunctionsequenceproteinmachine learingGeneticsQH426-470ENFrontiers in Genetics, Vol 12 (2021)
institution DOAJ
collection DOAJ
language EN
topic enzyme
function
sequence
protein
machine learing
Genetics
QH426-470
spellingShingle enzyme
function
sequence
protein
machine learing
Genetics
QH426-470
Rahu Sikander
Yuping Wang
Ali Ghulam
Xianjuan Wu
Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
description Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we propose a novel approach for predicting the amino acid sequences of enzymes and non-enzymes via Convolutional Neural Network (CNN). In CNN, the roles of enzymes are predicted from multiple sides of biological information, including information on sequences and structures. We propose the use of two-dimensional data via 2DCNN to predict the proteins of enzymes and non-enzymes by using the same fivefold cross-validation function. We also use an independent dataset to test the performance of our model, and the results demonstrate that we are able to solve the overfitting problem. We used the CNN model proposed herein to demonstrate the superiority of our model for classifying an entire set of filters, such as 32, 64, and 128 parameters, with the fivefold validation test set as the independent classification. Via the Dipeptide Deviation from Expected Mean (DDE) matrix, mutation information is extracted from amino acid sequences and structural information with the distance and angle of amino acids is conveyed. The derived feature maps are then encoded in DDE exploitation. The independent datasets are then compared with other two methods, namely, GRU and XGBOOST. All analyses were conducted using 32, 64 and 128 filters on our proposed CNN method. The cross-validation datasets achieved an accuracy score of 0.8762%, whereas the accuracy of independent datasets was 0.7621%. Additional variables were derived on the basis of ROC AUC with fivefold cross-validation was achieved score is 0.95%. The performance of our model and that of other models in terms of sensitivity (0.9028%) and specificity (0.8497%) was compared. The overall accuracy of our model was 0.9133% compared with 0.8310% for the other model.
format article
author Rahu Sikander
Yuping Wang
Ali Ghulam
Xianjuan Wu
author_facet Rahu Sikander
Yuping Wang
Ali Ghulam
Xianjuan Wu
author_sort Rahu Sikander
title Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_short Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_full Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_fullStr Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_full_unstemmed Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_sort identification of enzymes-specific protein domain based on dde, and convolutional neural network
publisher Frontiers Media S.A.
publishDate 2021
url https://doaj.org/article/f6758d809764473f90fb9a753313bc26
work_keys_str_mv AT rahusikander identificationofenzymesspecificproteindomainbasedonddeandconvolutionalneuralnetwork
AT yupingwang identificationofenzymesspecificproteindomainbasedonddeandconvolutionalneuralnetwork
AT alighulam identificationofenzymesspecificproteindomainbasedonddeandconvolutionalneuralnetwork
AT xianjuanwu identificationofenzymesspecificproteindomainbasedonddeandconvolutionalneuralnetwork
_version_ 1718404668034383872