Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
Abstract Lysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict effici...
Guardado en:
Autores principales: | , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Nature Portfolio
2020
|
Materias: | |
Acceso en línea: | https://doaj.org/article/a853eda2aeaf4504982c8c920cfc2d05 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:a853eda2aeaf4504982c8c920cfc2d05 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:a853eda2aeaf4504982c8c920cfc2d052021-12-02T12:33:46ZCharacterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian10.1038/s41598-020-77173-02045-2322https://doaj.org/article/a853eda2aeaf4504982c8c920cfc2d052020-11-01T00:00:00Zhttps://doi.org/10.1038/s41598-020-77173-0https://doaj.org/toc/2045-2322Abstract Lysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict efficiently only on histone or non-histone protein. Therefore, this work aims to give a more balanced performance in different species, here plant (non-histone) and mammalian (histone) are involved. SVM (support vector machine) and RF (random forest) were employed in this study. According to the results of cross-validations, the RF classifier based on EGAAC attribute achieved the best predictive performance which performs competitively good as existed methods, meanwhile more robust when dealing with imbalanced datasets. Moreover, an independent test was carried out, which compared the performance of this study and existed methods based on the same features or the same classifier. The classifiers of SVM and RF could achieve best performances with 92% sensitivity, 88% specificity, 90% accuracy, and an MCC of 0.80 in the mammalian dataset, and 77% sensitivity, 83% specificity, 70% accuracy and 0.54 MCC in a relatively small dataset of mammalian and a large-scaled plant dataset respectively. Moreover, a cross-species independent testing was also carried out in this study, which has proved the species diversity in plant and mammalian.Rulan WangZhuo WangHongfei WangYuxuan PangTzong-Yi LeeNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 10, Iss 1, Pp 1-12 (2020) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Rulan Wang Zhuo Wang Hongfei Wang Yuxuan Pang Tzong-Yi Lee Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian |
description |
Abstract Lysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict efficiently only on histone or non-histone protein. Therefore, this work aims to give a more balanced performance in different species, here plant (non-histone) and mammalian (histone) are involved. SVM (support vector machine) and RF (random forest) were employed in this study. According to the results of cross-validations, the RF classifier based on EGAAC attribute achieved the best predictive performance which performs competitively good as existed methods, meanwhile more robust when dealing with imbalanced datasets. Moreover, an independent test was carried out, which compared the performance of this study and existed methods based on the same features or the same classifier. The classifiers of SVM and RF could achieve best performances with 92% sensitivity, 88% specificity, 90% accuracy, and an MCC of 0.80 in the mammalian dataset, and 77% sensitivity, 83% specificity, 70% accuracy and 0.54 MCC in a relatively small dataset of mammalian and a large-scaled plant dataset respectively. Moreover, a cross-species independent testing was also carried out in this study, which has proved the species diversity in plant and mammalian. |
format |
article |
author |
Rulan Wang Zhuo Wang Hongfei Wang Yuxuan Pang Tzong-Yi Lee |
author_facet |
Rulan Wang Zhuo Wang Hongfei Wang Yuxuan Pang Tzong-Yi Lee |
author_sort |
Rulan Wang |
title |
Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian |
title_short |
Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian |
title_full |
Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian |
title_fullStr |
Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian |
title_full_unstemmed |
Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian |
title_sort |
characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian |
publisher |
Nature Portfolio |
publishDate |
2020 |
url |
https://doaj.org/article/a853eda2aeaf4504982c8c920cfc2d05 |
work_keys_str_mv |
AT rulanwang characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian AT zhuowang characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian AT hongfeiwang characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian AT yuxuanpang characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian AT tzongyilee characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian |
_version_ |
1718393877686124544 |