Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.

<h4>Background</h4>The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjus...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Chi Pang Li, Zu Guo Yu, Guo Sheng Han, Ka Hou Chu
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2012
Materias:
R
Q
Acceso en línea:https://doaj.org/article/c60667a4ae5846ec92514361cf93c239
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:c60667a4ae5846ec92514361cf93c239
record_format dspace
spelling oai:doaj.org-article:c60667a4ae5846ec92514361cf93c2392021-11-18T07:10:34ZAnalyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.1932-620310.1371/journal.pone.0042154https://doaj.org/article/c60667a4ae5846ec92514361cf93c2392012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22848736/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.<h4>Methodology/principal findings</h4>Three datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.<h4>Conclusions</h4>We conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.Chi Pang LiZu Guo YuGuo Sheng HanKa Hou ChuPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 7, p e42154 (2012)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Chi Pang Li
Zu Guo Yu
Guo Sheng Han
Ka Hou Chu
Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
description <h4>Background</h4>The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.<h4>Methodology/principal findings</h4>Three datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.<h4>Conclusions</h4>We conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.
format article
author Chi Pang Li
Zu Guo Yu
Guo Sheng Han
Ka Hou Chu
author_facet Chi Pang Li
Zu Guo Yu
Guo Sheng Han
Ka Hou Chu
author_sort Chi Pang Li
title Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_short Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_full Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_fullStr Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_full_unstemmed Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_sort analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
publisher Public Library of Science (PLoS)
publishDate 2012
url https://doaj.org/article/c60667a4ae5846ec92514361cf93c239
work_keys_str_mv AT chipangli analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance
AT zuguoyu analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance
AT guoshenghan analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance
AT kahouchu analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance
_version_ 1718423880330117120