Fast statistical alignment.

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Robert K Bradley, Adam Roberts, Michael Smoot, Sudeep Juvekar, Jaeyoung Do, Colin Dewey, Ian Holmes, Lior Pachter
Format:	article
Langue:	EN
Publié:	Public Library of Science (PLoS) 2009
Sujets:	Biology (General) QH301-705.5
Accès en ligne:	https://doaj.org/article/e3b9454ac68745d18b7a4d98c31e59bf
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

id	oai:doaj.org-article:e3b9454ac68745d18b7a4d98c31e59bf
record_format	dspace
spelling	oai:doaj.org-article:e3b9454ac68745d18b7a4d98c31e59bf2021-11-25T05:42:22ZFast statistical alignment.1553-734X1553-735810.1371/journal.pcbi.1000392https://doaj.org/article/e3b9454ac68745d18b7a4d98c31e59bf2009-05-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19478997/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment--previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches--yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.Robert K BradleyAdam RobertsMichael SmootSudeep JuvekarJaeyoung DoColin DeweyIan HolmesLior PachterPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 5, Iss 5, p e1000392 (2009)
institution	DOAJ
collection	DOAJ
language	EN
topic	Biology (General) QH301-705.5
spellingShingle	Biology (General) QH301-705.5 Robert K Bradley Adam Roberts Michael Smoot Sudeep Juvekar Jaeyoung Do Colin Dewey Ian Holmes Lior Pachter Fast statistical alignment.
description	We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment--previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches--yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.
format	article
author	Robert K Bradley Adam Roberts Michael Smoot Sudeep Juvekar Jaeyoung Do Colin Dewey Ian Holmes Lior Pachter
author_facet	Robert K Bradley Adam Roberts Michael Smoot Sudeep Juvekar Jaeyoung Do Colin Dewey Ian Holmes Lior Pachter
author_sort	Robert K Bradley
title	Fast statistical alignment.
title_short	Fast statistical alignment.
title_full	Fast statistical alignment.
title_fullStr	Fast statistical alignment.
title_full_unstemmed	Fast statistical alignment.
title_sort	fast statistical alignment.
publisher	Public Library of Science (PLoS)
publishDate	2009
url	https://doaj.org/article/e3b9454ac68745d18b7a4d98c31e59bf
work_keys_str_mv	AT robertkbradley faststatisticalalignment AT adamroberts faststatisticalalignment AT michaelsmoot faststatisticalalignment AT sudeepjuvekar faststatisticalalignment AT jaeyoungdo faststatisticalalignment AT colindewey faststatisticalalignment AT ianholmes faststatisticalalignment AT liorpachter faststatisticalalignment
_version_	1718414550410199040

Fast statistical alignment.

Documents similaires