A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes

Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits t...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Benjamin J. Garcia, Ramanuja Simha, Michael Garvin, Anna Furches, Piet Jones, Joao G.F.M. Gazolla, P. Doug Hyatt, Christopher W. Schadt, Dale Pelletier, Daniel Jacobson
Formato: article
Lenguaje:EN
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://doaj.org/article/b599a83ec0684751996935f86258cd20
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:b599a83ec0684751996935f86258cd20
record_format dspace
spelling oai:doaj.org-article:b599a83ec0684751996935f86258cd202021-11-18T04:46:28ZA k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes2001-037010.1016/j.csbj.2021.10.029https://doaj.org/article/b599a83ec0684751996935f86258cd202021-01-01T00:00:00Zhttp://www.sciencedirect.com/science/article/pii/S2001037021004517https://doaj.org/toc/2001-0370Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses.Benjamin J. GarciaRamanuja SimhaMichael GarvinAnna FurchesPiet JonesJoao G.F.M. GazollaP. Doug HyattChristopher W. SchadtDale PelletierDaniel JacobsonElsevierarticleMetagenomicsViriomeMetatranscriptomicsMicrobiomeAutism spectrum disorderPopulusBiotechnologyTP248.13-248.65ENComputational and Structural Biotechnology Journal, Vol 19, Iss , Pp 5911-5919 (2021)
institution DOAJ
collection DOAJ
language EN
topic Metagenomics
Viriome
Metatranscriptomics
Microbiome
Autism spectrum disorder
Populus
Biotechnology
TP248.13-248.65
spellingShingle Metagenomics
Viriome
Metatranscriptomics
Microbiome
Autism spectrum disorder
Populus
Biotechnology
TP248.13-248.65
Benjamin J. Garcia
Ramanuja Simha
Michael Garvin
Anna Furches
Piet Jones
Joao G.F.M. Gazolla
P. Doug Hyatt
Christopher W. Schadt
Dale Pelletier
Daniel Jacobson
A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
description Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses.
format article
author Benjamin J. Garcia
Ramanuja Simha
Michael Garvin
Anna Furches
Piet Jones
Joao G.F.M. Gazolla
P. Doug Hyatt
Christopher W. Schadt
Dale Pelletier
Daniel Jacobson
author_facet Benjamin J. Garcia
Ramanuja Simha
Michael Garvin
Anna Furches
Piet Jones
Joao G.F.M. Gazolla
P. Doug Hyatt
Christopher W. Schadt
Dale Pelletier
Daniel Jacobson
author_sort Benjamin J. Garcia
title A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_short A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_full A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_fullStr A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_full_unstemmed A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_sort k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
publisher Elsevier
publishDate 2021
url https://doaj.org/article/b599a83ec0684751996935f86258cd20
work_keys_str_mv AT benjaminjgarcia akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT ramanujasimha akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT michaelgarvin akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT annafurches akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT pietjones akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT joaogfmgazolla akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT pdoughyatt akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT christopherwschadt akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT dalepelletier akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT danieljacobson akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT benjaminjgarcia kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT ramanujasimha kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT michaelgarvin kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT annafurches kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT pietjones kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT joaogfmgazolla kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT pdoughyatt kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT christopherwschadt kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT dalepelletier kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT danieljacobson kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
_version_ 1718425061805785088