Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.

<h4>Background</h4>The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjus...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Chi Pang Li, Zu Guo Yu, Guo Sheng Han, Ka Hou Chu
Format:	article
Langue:	EN
Publié:	Public Library of Science (PLoS) 2012
Sujets:	Medicine R Science Q
Accès en ligne:	https://doaj.org/article/c60667a4ae5846ec92514361cf93c239
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

id	oai:doaj.org-article:c60667a4ae5846ec92514361cf93c239
record_format	dspace
spelling	oai:doaj.org-article:c60667a4ae5846ec92514361cf93c2392021-11-18T07:10:34ZAnalyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.1932-620310.1371/journal.pone.0042154https://doaj.org/article/c60667a4ae5846ec92514361cf93c2392012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22848736/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.<h4>Methodology/principal findings</h4>Three datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.<h4>Conclusions</h4>We conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.Chi Pang LiZu Guo YuGuo Sheng HanKa Hou ChuPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 7, p e42154 (2012)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Chi Pang Li Zu Guo Yu Guo Sheng Han Ka Hou Chu Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
description	<h4>Background</h4>The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.<h4>Methodology/principal findings</h4>Three datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.<h4>Conclusions</h4>We conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.
format	article
author	Chi Pang Li Zu Guo Yu Guo Sheng Han Ka Hou Chu
author_facet	Chi Pang Li Zu Guo Yu Guo Sheng Han Ka Hou Chu
author_sort	Chi Pang Li
title	Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_short	Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_full	Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_fullStr	Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_full_unstemmed	Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
title_sort	analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
publisher	Public Library of Science (PLoS)
publishDate	2012
url	https://doaj.org/article/c60667a4ae5846ec92514361cf93c239
work_keys_str_mv	AT chipangli analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance AT zuguoyu analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance AT guoshenghan analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance AT kahouchu analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance
_version_	1718423880330117120

Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.

Documents similaires