Fast mapping of short sequences with mismatches, insertions and deletions using index structures.

With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very differ...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Steve Hoffmann, Christian Otto, Stefan Kurtz, Cynthia M Sharma, Philipp Khaitovich, Jörg Vogel, Peter F Stadler, Jörg Hackermüller
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2009
Materias:	Biology (General) QH301-705.5
Acceso en línea:	https://doaj.org/article/cfd4781c4e6c4148862bdfe485b4cc03
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:cfd4781c4e6c4148862bdfe485b4cc03
record_format	dspace
spelling	oai:doaj.org-article:cfd4781c4e6c4148862bdfe485b4cc032021-11-25T05:42:10ZFast mapping of short sequences with mismatches, insertions and deletions using index structures.1553-734X1553-735810.1371/journal.pcbi.1000502https://doaj.org/article/cfd4781c4e6c4148862bdfe485b4cc032009-09-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19750212/pdf/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For example, it can handle the problem of leading and trailing contaminations caused by primers and poly-A tails in transcriptomics or the length-dependent increase of error rates. In these contexts, it thus simplifies the tedious and error-prone trimming step. For efficient searches, our method utilizes index structures in the form of enhanced suffix arrays. In a comparison with current methods for short read mapping, the presented approach shows significantly increased performance not only for 454 reads, but also for Illumina reads. Our approach is implemented in the software segemehl available at http://www.bioinf.uni-leipzig.de/Software/segemehl/.Steve HoffmannChristian OttoStefan KurtzCynthia M SharmaPhilipp KhaitovichJörg VogelPeter F StadlerJörg HackermüllerPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 5, Iss 9, p e1000502 (2009)
institution	DOAJ
collection	DOAJ
language	EN
topic	Biology (General) QH301-705.5
spellingShingle	Biology (General) QH301-705.5 Steve Hoffmann Christian Otto Stefan Kurtz Cynthia M Sharma Philipp Khaitovich Jörg Vogel Peter F Stadler Jörg Hackermüller Fast mapping of short sequences with mismatches, insertions and deletions using index structures.
description	With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For example, it can handle the problem of leading and trailing contaminations caused by primers and poly-A tails in transcriptomics or the length-dependent increase of error rates. In these contexts, it thus simplifies the tedious and error-prone trimming step. For efficient searches, our method utilizes index structures in the form of enhanced suffix arrays. In a comparison with current methods for short read mapping, the presented approach shows significantly increased performance not only for 454 reads, but also for Illumina reads. Our approach is implemented in the software segemehl available at http://www.bioinf.uni-leipzig.de/Software/segemehl/.
format	article
author	Steve Hoffmann Christian Otto Stefan Kurtz Cynthia M Sharma Philipp Khaitovich Jörg Vogel Peter F Stadler Jörg Hackermüller
author_facet	Steve Hoffmann Christian Otto Stefan Kurtz Cynthia M Sharma Philipp Khaitovich Jörg Vogel Peter F Stadler Jörg Hackermüller
author_sort	Steve Hoffmann
title	Fast mapping of short sequences with mismatches, insertions and deletions using index structures.
title_short	Fast mapping of short sequences with mismatches, insertions and deletions using index structures.
title_full	Fast mapping of short sequences with mismatches, insertions and deletions using index structures.
title_fullStr	Fast mapping of short sequences with mismatches, insertions and deletions using index structures.
title_full_unstemmed	Fast mapping of short sequences with mismatches, insertions and deletions using index structures.
title_sort	fast mapping of short sequences with mismatches, insertions and deletions using index structures.
publisher	Public Library of Science (PLoS)
publishDate	2009
url	https://doaj.org/article/cfd4781c4e6c4148862bdfe485b4cc03
work_keys_str_mv	AT stevehoffmann fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT christianotto fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT stefankurtz fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT cynthiamsharma fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT philippkhaitovich fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT jorgvogel fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT peterfstadler fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT jorghackermuller fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures
_version_	1718414545864622080

Fast mapping of short sequences with mismatches, insertions and deletions using index structures.

Ejemplares similares