PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm

Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desir...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jujuan Zhuang, Danyang Liu, Meng Lin, Wenjing Qiu, Jinyang Liu, Size Chen
Formato: article
Lenguaje:EN
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://doaj.org/article/893710d42b024260b70e1335792ffaf9
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:893710d42b024260b70e1335792ffaf9
record_format dspace
spelling oai:doaj.org-article:893710d42b024260b70e1335792ffaf92021-11-18T07:35:15ZPseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm1664-802110.3389/fgene.2021.773882https://doaj.org/article/893710d42b024260b70e1335792ffaf92021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fgene.2021.773882/fullhttps://doaj.org/toc/1664-8021Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming.Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification.Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep.Jujuan ZhuangDanyang LiuMeng LinWenjing QiuWenjing QiuJinyang LiuSize ChenSize ChenSize ChenFrontiers Media S.A.articleRNA modificationpseudouridine site predictionfeature extractiondeep learningcapsule networkGeneticsQH426-470ENFrontiers in Genetics, Vol 12 (2021)
institution DOAJ
collection DOAJ
language EN
topic RNA modification
pseudouridine site prediction
feature extraction
deep learning
capsule network
Genetics
QH426-470
spellingShingle RNA modification
pseudouridine site prediction
feature extraction
deep learning
capsule network
Genetics
QH426-470
Jujuan Zhuang
Danyang Liu
Meng Lin
Wenjing Qiu
Wenjing Qiu
Jinyang Liu
Size Chen
Size Chen
Size Chen
PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
description Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming.Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification.Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep.
format article
author Jujuan Zhuang
Danyang Liu
Meng Lin
Wenjing Qiu
Wenjing Qiu
Jinyang Liu
Size Chen
Size Chen
Size Chen
author_facet Jujuan Zhuang
Danyang Liu
Meng Lin
Wenjing Qiu
Wenjing Qiu
Jinyang Liu
Size Chen
Size Chen
Size Chen
author_sort Jujuan Zhuang
title PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_short PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_full PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_fullStr PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_full_unstemmed PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_sort pseudeep: rna pseudouridine site identification with deep learning algorithm
publisher Frontiers Media S.A.
publishDate 2021
url https://doaj.org/article/893710d42b024260b70e1335792ffaf9
work_keys_str_mv AT jujuanzhuang pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT danyangliu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT menglin pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT wenjingqiu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT wenjingqiu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT jinyangliu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT sizechen pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT sizechen pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT sizechen pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
_version_ 1718423234208071680