A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits t...
Guardado en:
Autores principales: | , , , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/b599a83ec0684751996935f86258cd20 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:b599a83ec0684751996935f86258cd20 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:b599a83ec0684751996935f86258cd202021-11-18T04:46:28ZA k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes2001-037010.1016/j.csbj.2021.10.029https://doaj.org/article/b599a83ec0684751996935f86258cd202021-01-01T00:00:00Zhttp://www.sciencedirect.com/science/article/pii/S2001037021004517https://doaj.org/toc/2001-0370Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses.Benjamin J. GarciaRamanuja SimhaMichael GarvinAnna FurchesPiet JonesJoao G.F.M. GazollaP. Doug HyattChristopher W. SchadtDale PelletierDaniel JacobsonElsevierarticleMetagenomicsViriomeMetatranscriptomicsMicrobiomeAutism spectrum disorderPopulusBiotechnologyTP248.13-248.65ENComputational and Structural Biotechnology Journal, Vol 19, Iss , Pp 5911-5919 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Metagenomics Viriome Metatranscriptomics Microbiome Autism spectrum disorder Populus Biotechnology TP248.13-248.65 |
spellingShingle |
Metagenomics Viriome Metatranscriptomics Microbiome Autism spectrum disorder Populus Biotechnology TP248.13-248.65 Benjamin J. Garcia Ramanuja Simha Michael Garvin Anna Furches Piet Jones Joao G.F.M. Gazolla P. Doug Hyatt Christopher W. Schadt Dale Pelletier Daniel Jacobson A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
description |
Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses. |
format |
article |
author |
Benjamin J. Garcia Ramanuja Simha Michael Garvin Anna Furches Piet Jones Joao G.F.M. Gazolla P. Doug Hyatt Christopher W. Schadt Dale Pelletier Daniel Jacobson |
author_facet |
Benjamin J. Garcia Ramanuja Simha Michael Garvin Anna Furches Piet Jones Joao G.F.M. Gazolla P. Doug Hyatt Christopher W. Schadt Dale Pelletier Daniel Jacobson |
author_sort |
Benjamin J. Garcia |
title |
A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_short |
A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_full |
A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_fullStr |
A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_full_unstemmed |
A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_sort |
k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
publisher |
Elsevier |
publishDate |
2021 |
url |
https://doaj.org/article/b599a83ec0684751996935f86258cd20 |
work_keys_str_mv |
AT benjaminjgarcia akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT ramanujasimha akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT michaelgarvin akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT annafurches akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT pietjones akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT joaogfmgazolla akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT pdoughyatt akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT christopherwschadt akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT dalepelletier akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT danieljacobson akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT benjaminjgarcia kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT ramanujasimha kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT michaelgarvin kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT annafurches kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT pietjones kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT joaogfmgazolla kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT pdoughyatt kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT christopherwschadt kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT dalepelletier kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT danieljacobson kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes |
_version_ |
1718425061805785088 |