A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome

Abstract DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neur...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Chowdhury Rafeed Rahman, Ruhul Amin, Swakkhar Shatabda, Md. Sadrul Islam Toaha
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/8b9fbe0d9f8d4fb8bbcb8712e9e6303b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:8b9fbe0d9f8d4fb8bbcb8712e9e6303b
record_format dspace
spelling oai:doaj.org-article:8b9fbe0d9f8d4fb8bbcb8712e9e6303b2021-12-02T16:50:17ZA convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome10.1038/s41598-021-89850-92045-2322https://doaj.org/article/8b9fbe0d9f8d4fb8bbcb8712e9e6303b2021-05-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-89850-9https://doaj.org/toc/2045-2322Abstract DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network (CNN) based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC (Pseudo Amino Acid Composition) inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves auROC (area under Receiver Operating Characteristic curve) score of 0.98 with an overall accuracy of 93.97% using fivefold cross validation on benchmark dataset. Finally, we evaluate our model on three other plant genome 6mA site identification test datasets. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. An algorithm for potential motif extraction and a feature importance analysis procedure are two by products of this research. Web tool for this research can be found at: https://cutt.ly/dgp3QTR .Chowdhury Rafeed RahmanRuhul AminSwakkhar ShatabdaMd. Sadrul Islam ToahaNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-13 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Chowdhury Rafeed Rahman
Ruhul Amin
Swakkhar Shatabda
Md. Sadrul Islam Toaha
A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome
description Abstract DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network (CNN) based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC (Pseudo Amino Acid Composition) inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves auROC (area under Receiver Operating Characteristic curve) score of 0.98 with an overall accuracy of 93.97% using fivefold cross validation on benchmark dataset. Finally, we evaluate our model on three other plant genome 6mA site identification test datasets. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. An algorithm for potential motif extraction and a feature importance analysis procedure are two by products of this research. Web tool for this research can be found at: https://cutt.ly/dgp3QTR .
format article
author Chowdhury Rafeed Rahman
Ruhul Amin
Swakkhar Shatabda
Md. Sadrul Islam Toaha
author_facet Chowdhury Rafeed Rahman
Ruhul Amin
Swakkhar Shatabda
Md. Sadrul Islam Toaha
author_sort Chowdhury Rafeed Rahman
title A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome
title_short A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome
title_full A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome
title_fullStr A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome
title_full_unstemmed A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome
title_sort convolution based computational approach towards dna n6-methyladenine site identification and motif extraction in rice genome
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/8b9fbe0d9f8d4fb8bbcb8712e9e6303b
work_keys_str_mv AT chowdhuryrafeedrahman aconvolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome
AT ruhulamin aconvolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome
AT swakkharshatabda aconvolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome
AT mdsadrulislamtoaha aconvolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome
AT chowdhuryrafeedrahman convolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome
AT ruhulamin convolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome
AT swakkharshatabda convolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome
AT mdsadrulislamtoaha convolutionbasedcomputationalapproachtowardsdnan6methyladeninesiteidentificationandmotifextractioninricegenome
_version_ 1718383050961715200