Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
<h4>Background</h4>The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjus...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2012
|
Materias: | |
Acceso en línea: | https://doaj.org/article/c60667a4ae5846ec92514361cf93c239 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:c60667a4ae5846ec92514361cf93c239 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:c60667a4ae5846ec92514361cf93c2392021-11-18T07:10:34ZAnalyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.1932-620310.1371/journal.pone.0042154https://doaj.org/article/c60667a4ae5846ec92514361cf93c2392012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22848736/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.<h4>Methodology/principal findings</h4>Three datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.<h4>Conclusions</h4>We conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.Chi Pang LiZu Guo YuGuo Sheng HanKa Hou ChuPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 7, p e42154 (2012) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Chi Pang Li Zu Guo Yu Guo Sheng Han Ka Hou Chu Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance. |
description |
<h4>Background</h4>The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.<h4>Methodology/principal findings</h4>Three datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.<h4>Conclusions</h4>We conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip. |
format |
article |
author |
Chi Pang Li Zu Guo Yu Guo Sheng Han Ka Hou Chu |
author_facet |
Chi Pang Li Zu Guo Yu Guo Sheng Han Ka Hou Chu |
author_sort |
Chi Pang Li |
title |
Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance. |
title_short |
Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance. |
title_full |
Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance. |
title_fullStr |
Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance. |
title_full_unstemmed |
Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance. |
title_sort |
analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2012 |
url |
https://doaj.org/article/c60667a4ae5846ec92514361cf93c239 |
work_keys_str_mv |
AT chipangli analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance AT zuguoyu analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance AT guoshenghan analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance AT kahouchu analyzingmultilocusplantbarcodingdatasetswithacompositionvectormethodbasedonadjustableweighteddistance |
_version_ |
1718423880330117120 |