Compression of FASTQ and SAM format sequencing data.

Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzco...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: James K Bonfield, Matthew V Mahoney
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2013
Materias:
R
Q
Acceso en línea:https://doaj.org/article/730ede695c1e456db83ad09e7326ce9c
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:730ede695c1e456db83ad09e7326ce9c
record_format dspace
spelling oai:doaj.org-article:730ede695c1e456db83ad09e7326ce9c2021-11-18T07:52:19ZCompression of FASTQ and SAM format sequencing data.1932-620310.1371/journal.pone.0059190https://doaj.org/article/730ede695c1e456db83ad09e7326ce9c2013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23533605/?tool=EBIhttps://doaj.org/toc/1932-6203Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/.James K BonfieldMatthew V MahoneyPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 8, Iss 3, p e59190 (2013)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
James K Bonfield
Matthew V Mahoney
Compression of FASTQ and SAM format sequencing data.
description Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/.
format article
author James K Bonfield
Matthew V Mahoney
author_facet James K Bonfield
Matthew V Mahoney
author_sort James K Bonfield
title Compression of FASTQ and SAM format sequencing data.
title_short Compression of FASTQ and SAM format sequencing data.
title_full Compression of FASTQ and SAM format sequencing data.
title_fullStr Compression of FASTQ and SAM format sequencing data.
title_full_unstemmed Compression of FASTQ and SAM format sequencing data.
title_sort compression of fastq and sam format sequencing data.
publisher Public Library of Science (PLoS)
publishDate 2013
url https://doaj.org/article/730ede695c1e456db83ad09e7326ce9c
work_keys_str_mv AT jameskbonfield compressionoffastqandsamformatsequencingdata
AT matthewvmahoney compressionoffastqandsamformatsequencingdata
_version_ 1718422859822399488