DNA word analysis based on the distribution of the distances between symmetric words
Abstract We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance di...
Guardado en:
Autores principales: | , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Nature Portfolio
2017
|
Materias: | |
Acceso en línea: | https://doaj.org/article/51c7b6a8aff94a299990323f68956312 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:51c7b6a8aff94a299990323f68956312 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:51c7b6a8aff94a299990323f689563122021-12-02T12:32:56ZDNA word analysis based on the distribution of the distances between symmetric words10.1038/s41598-017-00646-22045-2322https://doaj.org/article/51c7b6a8aff94a299990323f689563122017-04-01T00:00:00Zhttps://doi.org/10.1038/s41598-017-00646-2https://doaj.org/toc/2045-2322Abstract We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected.Ana H. M. P. TavaresArmando J. PinhoRaquel M. SilvaJoão M. O. S. RodriguesCarlos A. C. BastosPaulo J. S. G. FerreiraVera AfreixoNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 7, Iss 1, Pp 1-11 (2017) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Ana H. M. P. Tavares Armando J. Pinho Raquel M. Silva João M. O. S. Rodrigues Carlos A. C. Bastos Paulo J. S. G. Ferreira Vera Afreixo DNA word analysis based on the distribution of the distances between symmetric words |
description |
Abstract We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected. |
format |
article |
author |
Ana H. M. P. Tavares Armando J. Pinho Raquel M. Silva João M. O. S. Rodrigues Carlos A. C. Bastos Paulo J. S. G. Ferreira Vera Afreixo |
author_facet |
Ana H. M. P. Tavares Armando J. Pinho Raquel M. Silva João M. O. S. Rodrigues Carlos A. C. Bastos Paulo J. S. G. Ferreira Vera Afreixo |
author_sort |
Ana H. M. P. Tavares |
title |
DNA word analysis based on the distribution of the distances between symmetric words |
title_short |
DNA word analysis based on the distribution of the distances between symmetric words |
title_full |
DNA word analysis based on the distribution of the distances between symmetric words |
title_fullStr |
DNA word analysis based on the distribution of the distances between symmetric words |
title_full_unstemmed |
DNA word analysis based on the distribution of the distances between symmetric words |
title_sort |
dna word analysis based on the distribution of the distances between symmetric words |
publisher |
Nature Portfolio |
publishDate |
2017 |
url |
https://doaj.org/article/51c7b6a8aff94a299990323f68956312 |
work_keys_str_mv |
AT anahmptavares dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT armandojpinho dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT raquelmsilva dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT joaomosrodrigues dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT carlosacbastos dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT paulojsgferreira dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT veraafreixo dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords |
_version_ |
1718393918496702464 |