Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.

Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcri...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sara Mostafavi, Alexis Battle, Xiaowei Zhu, Alexander E Urban, Douglas Levinson, Stephen B Montgomery, Daphne Koller
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2013
Materias:
R
Q
Acceso en línea:https://doaj.org/article/b2924a71dadf489e8607ad370e407a38
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:b2924a71dadf489e8607ad370e407a38
record_format dspace
spelling oai:doaj.org-article:b2924a71dadf489e8607ad370e407a382021-11-18T07:37:18ZNormalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.1932-620310.1371/journal.pone.0068141https://doaj.org/article/b2924a71dadf489e8607ad370e407a382013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23874524/?tool=EBIhttps://doaj.org/toc/1932-6203Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.Sara MostafaviAlexis BattleXiaowei ZhuAlexander E UrbanDouglas LevinsonStephen B MontgomeryDaphne KollerPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 8, Iss 7, p e68141 (2013)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Sara Mostafavi
Alexis Battle
Xiaowei Zhu
Alexander E Urban
Douglas Levinson
Stephen B Montgomery
Daphne Koller
Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.
description Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.
format article
author Sara Mostafavi
Alexis Battle
Xiaowei Zhu
Alexander E Urban
Douglas Levinson
Stephen B Montgomery
Daphne Koller
author_facet Sara Mostafavi
Alexis Battle
Xiaowei Zhu
Alexander E Urban
Douglas Levinson
Stephen B Montgomery
Daphne Koller
author_sort Sara Mostafavi
title Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.
title_short Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.
title_full Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.
title_fullStr Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.
title_full_unstemmed Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.
title_sort normalizing rna-sequencing data by modeling hidden covariates with prior knowledge.
publisher Public Library of Science (PLoS)
publishDate 2013
url https://doaj.org/article/b2924a71dadf489e8607ad370e407a38
work_keys_str_mv AT saramostafavi normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT alexisbattle normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT xiaoweizhu normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT alexandereurban normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT douglaslevinson normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT stephenbmontgomery normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT daphnekoller normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
_version_ 1718423185056071680