lra: A long read aligner for sequences and contigs.

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the geno...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jingwen Ren, Mark J P Chaisson
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/374191bdee9242da9ac6ee49931ee816
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:374191bdee9242da9ac6ee49931ee816
record_format dspace
spelling oai:doaj.org-article:374191bdee9242da9ac6ee49931ee8162021-11-25T05:40:36Zlra: A long read aligner for sequences and contigs.1553-734X1553-735810.1371/journal.pcbi.1009078https://doaj.org/article/374191bdee9242da9ac6ee49931ee8162021-06-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009078https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).Jingwen RenMark J P ChaissonPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 6, p e1009078 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Jingwen Ren
Mark J P Chaisson
lra: A long read aligner for sequences and contigs.
description It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).
format article
author Jingwen Ren
Mark J P Chaisson
author_facet Jingwen Ren
Mark J P Chaisson
author_sort Jingwen Ren
title lra: A long read aligner for sequences and contigs.
title_short lra: A long read aligner for sequences and contigs.
title_full lra: A long read aligner for sequences and contigs.
title_fullStr lra: A long read aligner for sequences and contigs.
title_full_unstemmed lra: A long read aligner for sequences and contigs.
title_sort lra: a long read aligner for sequences and contigs.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/374191bdee9242da9ac6ee49931ee816
work_keys_str_mv AT jingwenren lraalongreadalignerforsequencesandcontigs
AT markjpchaisson lraalongreadalignerforsequencesandcontigs
_version_ 1718414511509078016