MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 4...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Wan-Ping Lee, Michael P Stromberg, Alistair Ward, Chip Stewart, Erik P Garrison, Gabor T Marth
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
R
Q
Acceso en línea:https://doaj.org/article/adf62c12502548b7afdddbe7d83ada01
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:adf62c12502548b7afdddbe7d83ada01
record_format dspace
spelling oai:doaj.org-article:adf62c12502548b7afdddbe7d83ada012021-11-18T08:29:33ZMOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.1932-620310.1371/journal.pone.0090581https://doaj.org/article/adf62c12502548b7afdddbe7d83ada012014-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24599324/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).Wan-Ping LeeMichael P StrombergAlistair WardChip StewartErik P GarrisonGabor T MarthPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 3, p e90581 (2014)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Wan-Ping Lee
Michael P Stromberg
Alistair Ward
Chip Stewart
Erik P Garrison
Gabor T Marth
MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
description MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
format article
author Wan-Ping Lee
Michael P Stromberg
Alistair Ward
Chip Stewart
Erik P Garrison
Gabor T Marth
author_facet Wan-Ping Lee
Michael P Stromberg
Alistair Ward
Chip Stewart
Erik P Garrison
Gabor T Marth
author_sort Wan-Ping Lee
title MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
title_short MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
title_full MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
title_fullStr MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
title_full_unstemmed MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
title_sort mosaik: a hash-based algorithm for accurate next-generation sequencing short-read mapping.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/adf62c12502548b7afdddbe7d83ada01
work_keys_str_mv AT wanpinglee mosaikahashbasedalgorithmforaccuratenextgenerationsequencingshortreadmapping
AT michaelpstromberg mosaikahashbasedalgorithmforaccuratenextgenerationsequencingshortreadmapping
AT alistairward mosaikahashbasedalgorithmforaccuratenextgenerationsequencingshortreadmapping
AT chipstewart mosaikahashbasedalgorithmforaccuratenextgenerationsequencingshortreadmapping
AT erikpgarrison mosaikahashbasedalgorithmforaccuratenextgenerationsequencingshortreadmapping
AT gabortmarth mosaikahashbasedalgorithmforaccuratenextgenerationsequencingshortreadmapping
_version_ 1718421762074476544