RNA-Seq gene profiling--a systematic empirical comparison.

Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning the short reads generated to the genome or transcriptome before quantifying expression of pre-defined sets of genes. Differences in the alignme...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Nuno A Fonseca, John Marioni, Alvis Brazma
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
R
Q
Acceso en línea:https://doaj.org/article/743d84ee5e334099ab33721196808220
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:743d84ee5e334099ab33721196808220
record_format dspace
spelling oai:doaj.org-article:743d84ee5e334099ab337211968082202021-11-25T05:58:29ZRNA-Seq gene profiling--a systematic empirical comparison.1932-620310.1371/journal.pone.0107026https://doaj.org/article/743d84ee5e334099ab337211968082202014-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0107026https://doaj.org/toc/1932-6203Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning the short reads generated to the genome or transcriptome before quantifying expression of pre-defined sets of genes. Differences in the alignment/quantification tools can have a major effect upon the expression levels found with important consequences for biological interpretation. Here we address two main issues: do different analysis pipelines affect the gene expression levels inferred from RNA-seq data? And, how close are the expression levels inferred to the "true" expression levels? We evaluate fifty gene profiling pipelines in experimental and simulated data sets with different characteristics (e.g, read length and sequencing depth). In the absence of knowledge of the 'ground truth' in real RNAseq data sets, we used simulated data to assess the differences between the "true" expression and those reconstructed by the analysis pipelines. Even though this approach does not take into account all known biases present in RNAseq data, it still allows to estimate the accuracy of the gene expression values inferred by different analysis pipelines. The results show that i) overall there is a high correlation between the expression levels inferred by the best pipelines and the true quantification values; ii) the error in the estimated gene expression values can vary considerably across genes; and iii) a small set of genes have expression estimates with consistently high error (across data sets and methods). Finally, although the mapping software is important, the quantification method makes a greater difference to the results.Nuno A FonsecaJohn MarioniAlvis BrazmaPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 9, p e107026 (2014)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Nuno A Fonseca
John Marioni
Alvis Brazma
RNA-Seq gene profiling--a systematic empirical comparison.
description Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning the short reads generated to the genome or transcriptome before quantifying expression of pre-defined sets of genes. Differences in the alignment/quantification tools can have a major effect upon the expression levels found with important consequences for biological interpretation. Here we address two main issues: do different analysis pipelines affect the gene expression levels inferred from RNA-seq data? And, how close are the expression levels inferred to the "true" expression levels? We evaluate fifty gene profiling pipelines in experimental and simulated data sets with different characteristics (e.g, read length and sequencing depth). In the absence of knowledge of the 'ground truth' in real RNAseq data sets, we used simulated data to assess the differences between the "true" expression and those reconstructed by the analysis pipelines. Even though this approach does not take into account all known biases present in RNAseq data, it still allows to estimate the accuracy of the gene expression values inferred by different analysis pipelines. The results show that i) overall there is a high correlation between the expression levels inferred by the best pipelines and the true quantification values; ii) the error in the estimated gene expression values can vary considerably across genes; and iii) a small set of genes have expression estimates with consistently high error (across data sets and methods). Finally, although the mapping software is important, the quantification method makes a greater difference to the results.
format article
author Nuno A Fonseca
John Marioni
Alvis Brazma
author_facet Nuno A Fonseca
John Marioni
Alvis Brazma
author_sort Nuno A Fonseca
title RNA-Seq gene profiling--a systematic empirical comparison.
title_short RNA-Seq gene profiling--a systematic empirical comparison.
title_full RNA-Seq gene profiling--a systematic empirical comparison.
title_fullStr RNA-Seq gene profiling--a systematic empirical comparison.
title_full_unstemmed RNA-Seq gene profiling--a systematic empirical comparison.
title_sort rna-seq gene profiling--a systematic empirical comparison.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/743d84ee5e334099ab33721196808220
work_keys_str_mv AT nunoafonseca rnaseqgeneprofilingasystematicempiricalcomparison
AT johnmarioni rnaseqgeneprofilingasystematicempiricalcomparison
AT alvisbrazma rnaseqgeneprofilingasystematicempiricalcomparison
_version_ 1718414347787567104