MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.

Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premat...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Vincent Ranwez, Sébastien Harispe, Frédéric Delsuc, Emmanuel J P Douzery
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/6c4f88abd0bc466982100122a80e899f
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:6c4f88abd0bc466982100122a80e899f
record_format dspace
spelling oai:doaj.org-article:6c4f88abd0bc466982100122a80e899f2021-11-04T06:08:26ZMACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.1932-620310.1371/journal.pone.0022594https://doaj.org/article/6c4f88abd0bc466982100122a80e899f2011-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21949676/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment.We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence.MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.Vincent RanwezSébastien HarispeFrédéric DelsucEmmanuel J P DouzeryPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 9, p e22594 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Vincent Ranwez
Sébastien Harispe
Frédéric Delsuc
Emmanuel J P Douzery
MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.
description Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment.We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence.MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.
format article
author Vincent Ranwez
Sébastien Harispe
Frédéric Delsuc
Emmanuel J P Douzery
author_facet Vincent Ranwez
Sébastien Harispe
Frédéric Delsuc
Emmanuel J P Douzery
author_sort Vincent Ranwez
title MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.
title_short MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.
title_full MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.
title_fullStr MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.
title_full_unstemmed MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.
title_sort macse: multiple alignment of coding sequences accounting for frameshifts and stop codons.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/6c4f88abd0bc466982100122a80e899f
work_keys_str_mv AT vincentranwez macsemultiplealignmentofcodingsequencesaccountingforframeshiftsandstopcodons
AT sebastienharispe macsemultiplealignmentofcodingsequencesaccountingforframeshiftsandstopcodons
AT fredericdelsuc macsemultiplealignmentofcodingsequencesaccountingforframeshiftsandstopcodons
AT emmanueljpdouzery macsemultiplealignmentofcodingsequencesaccountingforframeshiftsandstopcodons
_version_ 1718445153735147520