Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.

Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimiz...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Hsin-Nan Lin, Cédric Notredame, Jia-Ming Chang, Ting-Yi Sung, Wen-Lian Hsu
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/3e287cb4b2ed49a38fb4e64e13092cce
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:3e287cb4b2ed49a38fb4e64e13092cce
record_format dspace
spelling oai:doaj.org-article:3e287cb4b2ed49a38fb4e64e13092cce2021-11-18T07:33:15ZImproving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.1932-620310.1371/journal.pone.0027872https://doaj.org/article/3e287cb4b2ed49a38fb4e64e13092cce2011-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22163274/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.Hsin-Nan LinCédric NotredameJia-Ming ChangTing-Yi SungWen-Lian HsuPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 12, p e27872 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Hsin-Nan Lin
Cédric Notredame
Jia-Ming Chang
Ting-Yi Sung
Wen-Lian Hsu
Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.
description Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.
format article
author Hsin-Nan Lin
Cédric Notredame
Jia-Ming Chang
Ting-Yi Sung
Wen-Lian Hsu
author_facet Hsin-Nan Lin
Cédric Notredame
Jia-Ming Chang
Ting-Yi Sung
Wen-Lian Hsu
author_sort Hsin-Nan Lin
title Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.
title_short Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.
title_full Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.
title_fullStr Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.
title_full_unstemmed Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.
title_sort improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/3e287cb4b2ed49a38fb4e64e13092cce
work_keys_str_mv AT hsinnanlin improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords
AT cedricnotredame improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords
AT jiamingchang improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords
AT tingyisung improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords
AT wenlianhsu improvingthealignmentqualityofconsistencybasedalignerswithanevaluationfunctionusingsynonymousproteinwords
_version_ 1718423279871459328