Predicting S-nitrosylation proteins and sites by fusing multiple features
Protein S-nitrosylation is one of the most important post-translational modifications, a well-grounded understanding of S-nitrosylation is very significant since it plays a key role in a variety of biological processes. For an uncharacterized protein sequence, it is a very meaningful problem for bot...
Guardado en:
Autores principales: | , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
AIMS Press
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/8125e5cdbe6d4b8e9153fb38707c867a |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:8125e5cdbe6d4b8e9153fb38707c867a |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:8125e5cdbe6d4b8e9153fb38707c867a2021-11-29T05:40:51ZPredicting S-nitrosylation proteins and sites by fusing multiple features10.3934/mbe.20214501551-0018https://doaj.org/article/8125e5cdbe6d4b8e9153fb38707c867a2021-10-01T00:00:00Zhttps://www.aimspress.com/article/doi/10.3934/mbe.2021450?viewType=HTMLhttps://doaj.org/toc/1551-0018Protein S-nitrosylation is one of the most important post-translational modifications, a well-grounded understanding of S-nitrosylation is very significant since it plays a key role in a variety of biological processes. For an uncharacterized protein sequence, it is a very meaningful problem for both basic research and drug development when we can firstly identify whether it is a S-nitrosylation protein or not, and then predict the specific S-nitrosylation site(s). This work has proposed two models for identifying S-nitrosylation protein and its PTM sites. Firstly, three kinds of features are extracted from protein sequence: KNN scoring of functional domain annotation, PseAAC and bag-of-words based on the physical and chemical properties of amino acids. Secondly, the synthetic minority oversampling technique is used to balance the data sets, and some state-of-the-art classifiers and feature fusion strategies are performed on the balanced data sets. In the five-fold cross-validation for predicting S-nitrosylation proteins, the results of Accuracy (ACC), Matthew's correlation coefficient (MCC) and area under ROC curve (AUC) are 81.84%, 0.5178, 0.8635, respectively. Finally, a model for predicting S-nitrosylation sites has been constructed on the basis of tripeptide composition (TPC) and the composition of k-spaced amino acid pairs (CKSAAP). To eliminate redundant information and improve work efficiency, elastic nets are employed for feature selection. The five-fold cross-validation tests have indicated the promising success rates of the proposed model. For the convenience of related researchers, the web-server named "RF-SNOPS" has been established at http://www.jci-bioinfo.cn/RF-SNOPSWang-Ren Qiu Qian-Kun Wang Meng-Yue GuanJian-Hua JiaXuan XiaoAIMS Pressarticles-nitrosylationrandom forestpost-translational modificationmultiple featuresidentificationBiotechnologyTP248.13-248.65MathematicsQA1-939ENMathematical Biosciences and Engineering, Vol 18, Iss 6, Pp 9132-9147 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
s-nitrosylation random forest post-translational modification multiple features identification Biotechnology TP248.13-248.65 Mathematics QA1-939 |
spellingShingle |
s-nitrosylation random forest post-translational modification multiple features identification Biotechnology TP248.13-248.65 Mathematics QA1-939 Wang-Ren Qiu Qian-Kun Wang Meng-Yue Guan Jian-Hua Jia Xuan Xiao Predicting S-nitrosylation proteins and sites by fusing multiple features |
description |
Protein S-nitrosylation is one of the most important post-translational modifications, a well-grounded understanding of S-nitrosylation is very significant since it plays a key role in a variety of biological processes. For an uncharacterized protein sequence, it is a very meaningful problem for both basic research and drug development when we can firstly identify whether it is a S-nitrosylation protein or not, and then predict the specific S-nitrosylation site(s). This work has proposed two models for identifying S-nitrosylation protein and its PTM sites. Firstly, three kinds of features are extracted from protein sequence: KNN scoring of functional domain annotation, PseAAC and bag-of-words based on the physical and chemical properties of amino acids. Secondly, the synthetic minority oversampling technique is used to balance the data sets, and some state-of-the-art classifiers and feature fusion strategies are performed on the balanced data sets. In the five-fold cross-validation for predicting S-nitrosylation proteins, the results of Accuracy (ACC), Matthew's correlation coefficient (MCC) and area under ROC curve (AUC) are 81.84%, 0.5178, 0.8635, respectively. Finally, a model for predicting S-nitrosylation sites has been constructed on the basis of tripeptide composition (TPC) and the composition of k-spaced amino acid pairs (CKSAAP). To eliminate redundant information and improve work efficiency, elastic nets are employed for feature selection. The five-fold cross-validation tests have indicated the promising success rates of the proposed model. For the convenience of related researchers, the web-server named "RF-SNOPS" has been established at http://www.jci-bioinfo.cn/RF-SNOPS |
format |
article |
author |
Wang-Ren Qiu Qian-Kun Wang Meng-Yue Guan Jian-Hua Jia Xuan Xiao |
author_facet |
Wang-Ren Qiu Qian-Kun Wang Meng-Yue Guan Jian-Hua Jia Xuan Xiao |
author_sort |
Wang-Ren Qiu |
title |
Predicting S-nitrosylation proteins and sites by fusing multiple features |
title_short |
Predicting S-nitrosylation proteins and sites by fusing multiple features |
title_full |
Predicting S-nitrosylation proteins and sites by fusing multiple features |
title_fullStr |
Predicting S-nitrosylation proteins and sites by fusing multiple features |
title_full_unstemmed |
Predicting S-nitrosylation proteins and sites by fusing multiple features |
title_sort |
predicting s-nitrosylation proteins and sites by fusing multiple features |
publisher |
AIMS Press |
publishDate |
2021 |
url |
https://doaj.org/article/8125e5cdbe6d4b8e9153fb38707c867a |
work_keys_str_mv |
AT wangrenqiu predictingsnitrosylationproteinsandsitesbyfusingmultiplefeatures AT qiankunwang predictingsnitrosylationproteinsandsitesbyfusingmultiplefeatures AT mengyueguan predictingsnitrosylationproteinsandsitesbyfusingmultiplefeatures AT jianhuajia predictingsnitrosylationproteinsandsitesbyfusingmultiplefeatures AT xuanxiao predictingsnitrosylationproteinsandsitesbyfusingmultiplefeatures |
_version_ |
1718407654733250560 |