Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.

The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina's HiSeq2000, Life Technol...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Nora Rieber, Marc Zapatka, Bärbel Lasitschka, David Jones, Paul Northcott, Barbara Hutter, Natalie Jäger, Marcel Kool, Michael Taylor, Peter Lichter, Stefan Pfister, Stephan Wolf, Benedikt Brors, Roland Eils
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2013
Materias:
R
Q
Acceso en línea:https://doaj.org/article/408b0b951a5e4349acf88afcb63b782b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:408b0b951a5e4349acf88afcb63b782b
record_format dspace
spelling oai:doaj.org-article:408b0b951a5e4349acf88afcb63b782b2021-11-18T07:42:11ZCoverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.1932-620310.1371/journal.pone.0066621https://doaj.org/article/408b0b951a5e4349acf88afcb63b782b2013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23776689/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina's HiSeq2000, Life Technologies' SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics' technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies' platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes.Nora RieberMarc ZapatkaBärbel LasitschkaDavid JonesPaul NorthcottBarbara HutterNatalie JägerMarcel KoolMichael TaylorPeter LichterStefan PfisterStephan WolfBenedikt BrorsRoland EilsPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 8, Iss 6, p e66621 (2013)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Nora Rieber
Marc Zapatka
Bärbel Lasitschka
David Jones
Paul Northcott
Barbara Hutter
Natalie Jäger
Marcel Kool
Michael Taylor
Peter Lichter
Stefan Pfister
Stephan Wolf
Benedikt Brors
Roland Eils
Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.
description The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina's HiSeq2000, Life Technologies' SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics' technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies' platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes.
format article
author Nora Rieber
Marc Zapatka
Bärbel Lasitschka
David Jones
Paul Northcott
Barbara Hutter
Natalie Jäger
Marcel Kool
Michael Taylor
Peter Lichter
Stefan Pfister
Stephan Wolf
Benedikt Brors
Roland Eils
author_facet Nora Rieber
Marc Zapatka
Bärbel Lasitschka
David Jones
Paul Northcott
Barbara Hutter
Natalie Jäger
Marcel Kool
Michael Taylor
Peter Lichter
Stefan Pfister
Stephan Wolf
Benedikt Brors
Roland Eils
author_sort Nora Rieber
title Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.
title_short Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.
title_full Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.
title_fullStr Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.
title_full_unstemmed Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.
title_sort coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.
publisher Public Library of Science (PLoS)
publishDate 2013
url https://doaj.org/article/408b0b951a5e4349acf88afcb63b782b
work_keys_str_mv AT norarieber coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT marczapatka coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT barbellasitschka coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT davidjones coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT paulnorthcott coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT barbarahutter coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT nataliejager coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT marcelkool coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT michaeltaylor coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT peterlichter coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT stefanpfister coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT stephanwolf coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT benediktbrors coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT rolandeils coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
_version_ 1718423092528676864