Machine learning model for sequence-driven DNA G-quadruplex formation

Abstract We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiate...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Aleksandr B. Sahakyan, Vicki S. Chambers, Giovanni Marsico, Tobias Santner, Marco Di Antonio, Shankar Balasubramanian
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2017
Materias:
R
Q
Acceso en línea:https://doaj.org/article/289341f558e943b6bbdbc39c89dcf616
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:289341f558e943b6bbdbc39c89dcf616
record_format dspace
spelling oai:doaj.org-article:289341f558e943b6bbdbc39c89dcf6162021-12-02T15:05:14ZMachine learning model for sequence-driven DNA G-quadruplex formation10.1038/s41598-017-14017-42045-2322https://doaj.org/article/289341f558e943b6bbdbc39c89dcf6162017-11-01T00:00:00Zhttps://doi.org/10.1038/s41598-017-14017-4https://doaj.org/toc/2045-2322Abstract We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiates many widely accepted putative quadruplex sequences that do not actually form stable genomic G4 structures, correctly assessing the G4 folding potential of over 700,000 such sequences in the human genome. Moreover, our approach reveals the relative importance of sequence-based features coming from both within the G4 motifs and their flanking regions. The developed model can be applied to any DNA sequence or genome to characterise sequence-driven intramolecular G4 formation propensities.Aleksandr B. SahakyanVicki S. ChambersGiovanni MarsicoTobias SantnerMarco Di AntonioShankar BalasubramanianNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 7, Iss 1, Pp 1-11 (2017)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Aleksandr B. Sahakyan
Vicki S. Chambers
Giovanni Marsico
Tobias Santner
Marco Di Antonio
Shankar Balasubramanian
Machine learning model for sequence-driven DNA G-quadruplex formation
description Abstract We describe a sequence-based computational model to predict DNA G-quadruplex (G4) formation. The model was developed using large-scale machine learning from an extensive experimental G4-formation dataset, recently obtained for the human genome via G4-seq methodology. Our model differentiates many widely accepted putative quadruplex sequences that do not actually form stable genomic G4 structures, correctly assessing the G4 folding potential of over 700,000 such sequences in the human genome. Moreover, our approach reveals the relative importance of sequence-based features coming from both within the G4 motifs and their flanking regions. The developed model can be applied to any DNA sequence or genome to characterise sequence-driven intramolecular G4 formation propensities.
format article
author Aleksandr B. Sahakyan
Vicki S. Chambers
Giovanni Marsico
Tobias Santner
Marco Di Antonio
Shankar Balasubramanian
author_facet Aleksandr B. Sahakyan
Vicki S. Chambers
Giovanni Marsico
Tobias Santner
Marco Di Antonio
Shankar Balasubramanian
author_sort Aleksandr B. Sahakyan
title Machine learning model for sequence-driven DNA G-quadruplex formation
title_short Machine learning model for sequence-driven DNA G-quadruplex formation
title_full Machine learning model for sequence-driven DNA G-quadruplex formation
title_fullStr Machine learning model for sequence-driven DNA G-quadruplex formation
title_full_unstemmed Machine learning model for sequence-driven DNA G-quadruplex formation
title_sort machine learning model for sequence-driven dna g-quadruplex formation
publisher Nature Portfolio
publishDate 2017
url https://doaj.org/article/289341f558e943b6bbdbc39c89dcf616
work_keys_str_mv AT aleksandrbsahakyan machinelearningmodelforsequencedrivendnagquadruplexformation
AT vickischambers machinelearningmodelforsequencedrivendnagquadruplexformation
AT giovannimarsico machinelearningmodelforsequencedrivendnagquadruplexformation
AT tobiassantner machinelearningmodelforsequencedrivendnagquadruplexformation
AT marcodiantonio machinelearningmodelforsequencedrivendnagquadruplexformation
AT shankarbalasubramanian machinelearningmodelforsequencedrivendnagquadruplexformation
_version_ 1718388880126771200