Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The sta...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Dongjun Chung, Pei Fen Kuan, Bo Li, Rajendran Sanalkumar, Kun Liang, Emery H Bresnick, Colin Dewey, Sündüz Keleş
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
Acceso en línea:https://doaj.org/article/557b1a0a846f4972a5cd394478af23a7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:557b1a0a846f4972a5cd394478af23a7
record_format dspace
spelling oai:doaj.org-article:557b1a0a846f4972a5cd394478af23a72021-11-18T05:50:25ZDiscovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.1553-734X1553-735810.1371/journal.pcbi.1002111https://doaj.org/article/557b1a0a846f4972a5cd394478af23a72011-07-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21779159/pdf/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.Dongjun ChungPei Fen KuanBo LiRajendran SanalkumarKun LiangEmery H BresnickColin DeweySündüz KeleşPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 7, Iss 7, p e1002111 (2011)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Dongjun Chung
Pei Fen Kuan
Bo Li
Rajendran Sanalkumar
Kun Liang
Emery H Bresnick
Colin Dewey
Sündüz Keleş
Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
description Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
format article
author Dongjun Chung
Pei Fen Kuan
Bo Li
Rajendran Sanalkumar
Kun Liang
Emery H Bresnick
Colin Dewey
Sündüz Keleş
author_facet Dongjun Chung
Pei Fen Kuan
Bo Li
Rajendran Sanalkumar
Kun Liang
Emery H Bresnick
Colin Dewey
Sündüz Keleş
author_sort Dongjun Chung
title Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
title_short Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
title_full Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
title_fullStr Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
title_full_unstemmed Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
title_sort discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of chip-seq data.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/557b1a0a846f4972a5cd394478af23a7
work_keys_str_mv AT dongjunchung discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT peifenkuan discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT boli discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT rajendransanalkumar discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT kunliang discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT emeryhbresnick discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT colindewey discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
AT sunduzkeles discoveringtranscriptionfactorbindingsitesinhighlyrepetitiveregionsofgenomeswithmultireadanalysisofchipseqdata
_version_ 1718424785557389312