DNA word analysis based on the distribution of the distances between symmetric words

Abstract We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance di...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ana H. M. P. Tavares, Armando J. Pinho, Raquel M. Silva, João M. O. S. Rodrigues, Carlos A. C. Bastos, Paulo J. S. G. Ferreira, Vera Afreixo
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2017
Materias:
R
Q
Acceso en línea:https://doaj.org/article/51c7b6a8aff94a299990323f68956312
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:51c7b6a8aff94a299990323f68956312
record_format dspace
spelling oai:doaj.org-article:51c7b6a8aff94a299990323f689563122021-12-02T12:32:56ZDNA word analysis based on the distribution of the distances between symmetric words10.1038/s41598-017-00646-22045-2322https://doaj.org/article/51c7b6a8aff94a299990323f689563122017-04-01T00:00:00Zhttps://doi.org/10.1038/s41598-017-00646-2https://doaj.org/toc/2045-2322Abstract We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected.Ana H. M. P. TavaresArmando J. PinhoRaquel M. SilvaJoão M. O. S. RodriguesCarlos A. C. BastosPaulo J. S. G. FerreiraVera AfreixoNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 7, Iss 1, Pp 1-11 (2017)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Ana H. M. P. Tavares
Armando J. Pinho
Raquel M. Silva
João M. O. S. Rodrigues
Carlos A. C. Bastos
Paulo J. S. G. Ferreira
Vera Afreixo
DNA word analysis based on the distribution of the distances between symmetric words
description Abstract We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected.
format article
author Ana H. M. P. Tavares
Armando J. Pinho
Raquel M. Silva
João M. O. S. Rodrigues
Carlos A. C. Bastos
Paulo J. S. G. Ferreira
Vera Afreixo
author_facet Ana H. M. P. Tavares
Armando J. Pinho
Raquel M. Silva
João M. O. S. Rodrigues
Carlos A. C. Bastos
Paulo J. S. G. Ferreira
Vera Afreixo
author_sort Ana H. M. P. Tavares
title DNA word analysis based on the distribution of the distances between symmetric words
title_short DNA word analysis based on the distribution of the distances between symmetric words
title_full DNA word analysis based on the distribution of the distances between symmetric words
title_fullStr DNA word analysis based on the distribution of the distances between symmetric words
title_full_unstemmed DNA word analysis based on the distribution of the distances between symmetric words
title_sort dna word analysis based on the distribution of the distances between symmetric words
publisher Nature Portfolio
publishDate 2017
url https://doaj.org/article/51c7b6a8aff94a299990323f68956312
work_keys_str_mv AT anahmptavares dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT armandojpinho dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT raquelmsilva dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT joaomosrodrigues dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT carlosacbastos dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT paulojsgferreira dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT veraafreixo dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
_version_ 1718393918496702464