Systematic clustering of transcription start site landscapes.

Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earl...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Xiaobei Zhao, Eivind Valen, Brian J Parker, Albin Sandelin
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/8143a7e7bf684957be8cdc3c01b7aa82
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:8143a7e7bf684957be8cdc3c01b7aa82
record_format dspace
spelling oai:doaj.org-article:8143a7e7bf684957be8cdc3c01b7aa822021-11-18T06:47:24ZSystematic clustering of transcription start site landscapes.1932-620310.1371/journal.pone.0023409https://doaj.org/article/8143a7e7bf684957be8cdc3c01b7aa822011-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21887249/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed.Xiaobei ZhaoEivind ValenBrian J ParkerAlbin SandelinPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 8, p e23409 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Xiaobei Zhao
Eivind Valen
Brian J Parker
Albin Sandelin
Systematic clustering of transcription start site landscapes.
description Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed.
format article
author Xiaobei Zhao
Eivind Valen
Brian J Parker
Albin Sandelin
author_facet Xiaobei Zhao
Eivind Valen
Brian J Parker
Albin Sandelin
author_sort Xiaobei Zhao
title Systematic clustering of transcription start site landscapes.
title_short Systematic clustering of transcription start site landscapes.
title_full Systematic clustering of transcription start site landscapes.
title_fullStr Systematic clustering of transcription start site landscapes.
title_full_unstemmed Systematic clustering of transcription start site landscapes.
title_sort systematic clustering of transcription start site landscapes.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/8143a7e7bf684957be8cdc3c01b7aa82
work_keys_str_mv AT xiaobeizhao systematicclusteringoftranscriptionstartsitelandscapes
AT eivindvalen systematicclusteringoftranscriptionstartsitelandscapes
AT brianjparker systematicclusteringoftranscriptionstartsitelandscapes
AT albinsandelin systematicclusteringoftranscriptionstartsitelandscapes
_version_ 1718424425304424448