Comparison of metatranscriptomic samples based on k-tuple frequencies.
<h4>Background</h4>The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial...
Guardado en:
Autores principales: | , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2014
|
Materias: | |
Acceso en línea: | https://doaj.org/article/2c72aa0ca3f1484987e2677d4f3b5d4d |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:2c72aa0ca3f1484987e2677d4f3b5d4d |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:2c72aa0ca3f1484987e2677d4f3b5d4d2021-11-18T08:39:06ZComparison of metatranscriptomic samples based on k-tuple frequencies.1932-620310.1371/journal.pone.0084348https://doaj.org/article/2c72aa0ca3f1484987e2677d4f3b5d4d2014-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24392128/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best.<h4>Results</h4>We applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three d2-type dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical 1p-norm distances. Results showed that the measure d2(S) can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at http://code.google.com/p/d2-tools/.<h4>Conclusions</h4>The k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The d2(S) dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model.Ying WangLin LiuLina ChenTing ChenFengzhu SunPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 1, p e84348 (2014) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Ying Wang Lin Liu Lina Chen Ting Chen Fengzhu Sun Comparison of metatranscriptomic samples based on k-tuple frequencies. |
description |
<h4>Background</h4>The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best.<h4>Results</h4>We applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three d2-type dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical 1p-norm distances. Results showed that the measure d2(S) can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at http://code.google.com/p/d2-tools/.<h4>Conclusions</h4>The k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The d2(S) dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model. |
format |
article |
author |
Ying Wang Lin Liu Lina Chen Ting Chen Fengzhu Sun |
author_facet |
Ying Wang Lin Liu Lina Chen Ting Chen Fengzhu Sun |
author_sort |
Ying Wang |
title |
Comparison of metatranscriptomic samples based on k-tuple frequencies. |
title_short |
Comparison of metatranscriptomic samples based on k-tuple frequencies. |
title_full |
Comparison of metatranscriptomic samples based on k-tuple frequencies. |
title_fullStr |
Comparison of metatranscriptomic samples based on k-tuple frequencies. |
title_full_unstemmed |
Comparison of metatranscriptomic samples based on k-tuple frequencies. |
title_sort |
comparison of metatranscriptomic samples based on k-tuple frequencies. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2014 |
url |
https://doaj.org/article/2c72aa0ca3f1484987e2677d4f3b5d4d |
work_keys_str_mv |
AT yingwang comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT linliu comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT linachen comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT tingchen comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies AT fengzhusun comparisonofmetatranscriptomicsamplesbasedonktuplefrequencies |
_version_ |
1718421490394726400 |