Prediction of functional microexons by transfer learning

Abstract Background Microexons are a particular kind of exon of less than 30 nucleotides in length. More than 60% of annotated human microexons were found to have high levels of sequence conservation, suggesting their potential functions. There is thus a need to develop a method for predicting funct...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Qi Cheng, Bo He, Chengkui Zhao, Hongyuan Bi, Duojiao Chen, Shuangze Han, Haikuan Gao, Weixing Feng
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
Acceso en línea:https://doaj.org/article/a41ebdf243e64952b53255a68065586c
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:Abstract Background Microexons are a particular kind of exon of less than 30 nucleotides in length. More than 60% of annotated human microexons were found to have high levels of sequence conservation, suggesting their potential functions. There is thus a need to develop a method for predicting functional microexons. Results Given the lack of a publicly available functional label for microexons, we employed a transfer learning skill called Transfer Component Analysis (TCA) to transfer the knowledge obtained from feature mapping for the prediction of functional microexons. To provide reference knowledge, microindels were chosen because of their similarities to microexons. Then, Support Vector Machine (SVM) was used to train a classification model in the newly built feature space for the functional microindels. With the trained model, functional microexons were predicted. We also built a tool based on this model to predict other functional microexons. We then used this tool to predict a total of 19 functional microexons reported in the literature. This approach successfully predicted 16 out of 19 samples, giving accuracy greater than 80%. Conclusions In this study, we proposed a method for predicting functional microexons and applied it, with the predictive results being largely consistent with records in the literature.