On the value of intra-motif dependencies of human insulator protein CTCF.

The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algo...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ralf Eggeling, André Gohr, Jens Keilwagen, Michaela Mohr, Stefan Posch, Andrew D Smith, Ivo Grosse
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
R
Q
Acceso en línea:https://doaj.org/article/de9f98e2c7434908ae45c71a18e090e9
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:de9f98e2c7434908ae45c71a18e090e9
record_format dspace
spelling oai:doaj.org-article:de9f98e2c7434908ae45c71a18e090e92021-11-18T08:36:35ZOn the value of intra-motif dependencies of human insulator protein CTCF.1932-620310.1371/journal.pone.0085629https://doaj.org/article/de9f98e2c7434908ae45c71a18e090e92014-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24465627/?tool=EBIhttps://doaj.org/toc/1932-6203The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3' end.Ralf EggelingAndré GohrJens KeilwagenMichaela MohrStefan PoschAndrew D SmithIvo GrossePublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 1, p e85629 (2014)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Ralf Eggeling
André Gohr
Jens Keilwagen
Michaela Mohr
Stefan Posch
Andrew D Smith
Ivo Grosse
On the value of intra-motif dependencies of human insulator protein CTCF.
description The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3' end.
format article
author Ralf Eggeling
André Gohr
Jens Keilwagen
Michaela Mohr
Stefan Posch
Andrew D Smith
Ivo Grosse
author_facet Ralf Eggeling
André Gohr
Jens Keilwagen
Michaela Mohr
Stefan Posch
Andrew D Smith
Ivo Grosse
author_sort Ralf Eggeling
title On the value of intra-motif dependencies of human insulator protein CTCF.
title_short On the value of intra-motif dependencies of human insulator protein CTCF.
title_full On the value of intra-motif dependencies of human insulator protein CTCF.
title_fullStr On the value of intra-motif dependencies of human insulator protein CTCF.
title_full_unstemmed On the value of intra-motif dependencies of human insulator protein CTCF.
title_sort on the value of intra-motif dependencies of human insulator protein ctcf.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/de9f98e2c7434908ae45c71a18e090e9
work_keys_str_mv AT ralfeggeling onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT andregohr onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT jenskeilwagen onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT michaelamohr onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT stefanposch onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT andrewdsmith onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
AT ivogrosse onthevalueofintramotifdependenciesofhumaninsulatorproteinctcf
_version_ 1718421593104842752