Fast statistical alignment.

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Robert K Bradley, Adam Roberts, Michael Smoot, Sudeep Juvekar, Jaeyoung Do, Colin Dewey, Ian Holmes, Lior Pachter
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2009
Materias:
Acceso en línea:https://doaj.org/article/e3b9454ac68745d18b7a4d98c31e59bf
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e3b9454ac68745d18b7a4d98c31e59bf
record_format dspace
spelling oai:doaj.org-article:e3b9454ac68745d18b7a4d98c31e59bf2021-11-25T05:42:22ZFast statistical alignment.1553-734X1553-735810.1371/journal.pcbi.1000392https://doaj.org/article/e3b9454ac68745d18b7a4d98c31e59bf2009-05-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19478997/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment--previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches--yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.Robert K BradleyAdam RobertsMichael SmootSudeep JuvekarJaeyoung DoColin DeweyIan HolmesLior PachterPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 5, Iss 5, p e1000392 (2009)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Robert K Bradley
Adam Roberts
Michael Smoot
Sudeep Juvekar
Jaeyoung Do
Colin Dewey
Ian Holmes
Lior Pachter
Fast statistical alignment.
description We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment--previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches--yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.
format article
author Robert K Bradley
Adam Roberts
Michael Smoot
Sudeep Juvekar
Jaeyoung Do
Colin Dewey
Ian Holmes
Lior Pachter
author_facet Robert K Bradley
Adam Roberts
Michael Smoot
Sudeep Juvekar
Jaeyoung Do
Colin Dewey
Ian Holmes
Lior Pachter
author_sort Robert K Bradley
title Fast statistical alignment.
title_short Fast statistical alignment.
title_full Fast statistical alignment.
title_fullStr Fast statistical alignment.
title_full_unstemmed Fast statistical alignment.
title_sort fast statistical alignment.
publisher Public Library of Science (PLoS)
publishDate 2009
url https://doaj.org/article/e3b9454ac68745d18b7a4d98c31e59bf
work_keys_str_mv AT robertkbradley faststatisticalalignment
AT adamroberts faststatisticalalignment
AT michaelsmoot faststatisticalalignment
AT sudeepjuvekar faststatisticalalignment
AT jaeyoungdo faststatisticalalignment
AT colindewey faststatisticalalignment
AT ianholmes faststatisticalalignment
AT liorpachter faststatisticalalignment
_version_ 1718414550410199040