DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.

For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple-sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g....

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Damon P Little
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/e8a0859390ce4d3abc4d264758f8befd
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e8a0859390ce4d3abc4d264758f8befd
record_format dspace
spelling oai:doaj.org-article:e8a0859390ce4d3abc4d264758f8befd2021-11-18T06:47:55ZDNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.1932-620310.1371/journal.pone.0020552https://doaj.org/article/e8a0859390ce4d3abc4d264758f8befd2011-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21857897/?tool=EBIhttps://doaj.org/toc/1932-6203For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple-sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple-sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment-free sequence identification algorithm--BRONX--that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple-sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user-defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini-barcode queries against a full-length barcode database). BRONX consistently produced better identifications at the genus-level for all query types.Damon P LittlePublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 8, p e20552 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Damon P Little
DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.
description For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple-sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple-sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment-free sequence identification algorithm--BRONX--that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple-sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user-defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini-barcode queries against a full-length barcode database). BRONX consistently produced better identifications at the genus-level for all query types.
format article
author Damon P Little
author_facet Damon P Little
author_sort Damon P Little
title DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.
title_short DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.
title_full DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.
title_fullStr DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.
title_full_unstemmed DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.
title_sort dna barcode sequence identification incorporating taxonomic hierarchy and within taxon variability.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/e8a0859390ce4d3abc4d264758f8befd
work_keys_str_mv AT damonplittle dnabarcodesequenceidentificationincorporatingtaxonomichierarchyandwithintaxonvariability
_version_ 1718424388098850816