A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides

As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Byungjo Lee, Min Kyoung Shin, In-Wook Hwang, Junghyun Jung, Yu Jeong Shim, Go Woon Kim, Seung Tae Kim, Wonhee Jang, Jung-Suk Sung
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/87bd9b8690694363bfe6e4a9362bb36f
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:87bd9b8690694363bfe6e4a9362bb36f
record_format dspace
spelling oai:doaj.org-article:87bd9b8690694363bfe6e4a9362bb36f2021-11-25T17:55:06ZA Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides10.3390/ijms2222122911422-00671661-6596https://doaj.org/article/87bd9b8690694363bfe6e4a9362bb36f2021-11-01T00:00:00Zhttps://www.mdpi.com/1422-0067/22/22/12291https://doaj.org/toc/1661-6596https://doaj.org/toc/1422-0067As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing a deep learning model is the limited data on neurotoxic peptides. Here, we present a peptide data augmentation method that improves the recognition of neurotoxic peptides via a convolutional neural network model. The neurotoxic peptides were augmented with the known neurotoxic peptides from UniProt database, and the models were trained using a training set with or without the generated sequences to verify the augmented data. The model trained with the augmented dataset outperformed the one with the unaugmented dataset, achieving accuracy of 0.9953, precision of 0.9922, recall of 0.9984, and <i>F</i>1 score of 0.9953 in simulation dataset. From the set of all RNA transcripts of <i>Callobius koreanus</i> spider, we discovered neurotoxic peptides via the model, resulting in 275 putative peptides of which 252 novel sequences and only 23 sequences showing homology with the known peptides by Basic Local Alignment Search Tool. Among these 275 peptides, four were selected and shown to have neuromodulatory effects on the human neuroblastoma cell line SH-SY5Y. The augmentation method presented here may be applied to the identification of other functional peptides from biological resources with insufficient data.Byungjo LeeMin Kyoung ShinIn-Wook HwangJunghyun JungYu Jeong ShimGo Woon KimSeung Tae KimWonhee JangJung-Suk SungMDPI AGarticledeep learningdata augmentationconvolutional neural networkneurotoxic peptide predictionspider transcriptomeBiology (General)QH301-705.5ChemistryQD1-999ENInternational Journal of Molecular Sciences, Vol 22, Iss 12291, p 12291 (2021)
institution DOAJ
collection DOAJ
language EN
topic deep learning
data augmentation
convolutional neural network
neurotoxic peptide prediction
spider transcriptome
Biology (General)
QH301-705.5
Chemistry
QD1-999
spellingShingle deep learning
data augmentation
convolutional neural network
neurotoxic peptide prediction
spider transcriptome
Biology (General)
QH301-705.5
Chemistry
QD1-999
Byungjo Lee
Min Kyoung Shin
In-Wook Hwang
Junghyun Jung
Yu Jeong Shim
Go Woon Kim
Seung Tae Kim
Wonhee Jang
Jung-Suk Sung
A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
description As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing a deep learning model is the limited data on neurotoxic peptides. Here, we present a peptide data augmentation method that improves the recognition of neurotoxic peptides via a convolutional neural network model. The neurotoxic peptides were augmented with the known neurotoxic peptides from UniProt database, and the models were trained using a training set with or without the generated sequences to verify the augmented data. The model trained with the augmented dataset outperformed the one with the unaugmented dataset, achieving accuracy of 0.9953, precision of 0.9922, recall of 0.9984, and <i>F</i>1 score of 0.9953 in simulation dataset. From the set of all RNA transcripts of <i>Callobius koreanus</i> spider, we discovered neurotoxic peptides via the model, resulting in 275 putative peptides of which 252 novel sequences and only 23 sequences showing homology with the known peptides by Basic Local Alignment Search Tool. Among these 275 peptides, four were selected and shown to have neuromodulatory effects on the human neuroblastoma cell line SH-SY5Y. The augmentation method presented here may be applied to the identification of other functional peptides from biological resources with insufficient data.
format article
author Byungjo Lee
Min Kyoung Shin
In-Wook Hwang
Junghyun Jung
Yu Jeong Shim
Go Woon Kim
Seung Tae Kim
Wonhee Jang
Jung-Suk Sung
author_facet Byungjo Lee
Min Kyoung Shin
In-Wook Hwang
Junghyun Jung
Yu Jeong Shim
Go Woon Kim
Seung Tae Kim
Wonhee Jang
Jung-Suk Sung
author_sort Byungjo Lee
title A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_short A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_full A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_fullStr A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_full_unstemmed A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_sort deep learning approach with data augmentation to predict novel spider neurotoxic peptides
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/87bd9b8690694363bfe6e4a9362bb36f
work_keys_str_mv AT byungjolee adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT minkyoungshin adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT inwookhwang adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT junghyunjung adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT yujeongshim adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT gowoonkim adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT seungtaekim adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT wonheejang adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT jungsuksung adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT byungjolee deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT minkyoungshin deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT inwookhwang deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT junghyunjung deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT yujeongshim deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT gowoonkim deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT seungtaekim deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT wonheejang deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT jungsuksung deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
_version_ 1718411873084243968