<italic toggle="yes">k</italic>-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank

ABSTRACT Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome s...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Guillaume Bernard, Paul Greenfield, Mark A. Ragan, Cheong Xin Chan
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2018
Materias:
Acceso en línea:https://doaj.org/article/31596908f9ce4e728d8f4a5919457e6f
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:31596908f9ce4e728d8f4a5919457e6f
record_format dspace
spelling oai:doaj.org-article:31596908f9ce4e728d8f4a5919457e6f2021-12-02T18:39:46Z<italic toggle="yes">k</italic>-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank10.1128/mSystems.00257-182379-5077https://doaj.org/article/31596908f9ce4e728d8f4a5919457e6f2018-10-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00257-18https://doaj.org/toc/2379-5077ABSTRACT Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences, and a tree-like structure does not adequately capture how these processes impact microbial physiology. Here we adopted alignment-free approaches based on k-mer statistics to infer phylogenomic networks involving 2,783 completely sequenced bacterial and archaeal genomes and compared the contributions of rRNA, protein-coding, and plasmid sequences to these networks. Our results show that the phylogenomic signal arising from ribosomal RNAs is strong and extends broadly across all taxa, whereas that from plasmids is strong but restricted to closely related groups, particularly Proteobacteria. However, the signal from the other chromosomal regions is restricted in breadth. We show that mean k-mer similarity can correlate with taxonomic rank. We also link the implicated k-mers to genome annotation (thus, functions) and define core k-mers (thus, core functions) in specific phyletic groups. Highly conserved functions in most phyla include amino acid metabolism and transport as well as energy production and conversion. Intracellular trafficking and secretion are the most prominent core functions among Spirochaetes, whereas energy production and conversion are not highly conserved among the largely parasitic or commensal Tenericutes. These observations suggest that differential conservation of functions relates to niche specialization and evolutionary diversification of microbes. Our results demonstrate that k-mer approaches can be used to efficiently identify phylogenomic signals and conserved core functions at the multigenome scale. IMPORTANCE Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly Proteobacteria. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.Guillaume BernardPaul GreenfieldMark A. RaganCheong Xin ChanAmerican Society for Microbiologyarticlecore functionsk-mersnetworksphylogenetic analysisphylogenomicsMicrobiologyQR1-502ENmSystems, Vol 3, Iss 6 (2018)
institution DOAJ
collection DOAJ
language EN
topic core functions
k-mers
networks
phylogenetic analysis
phylogenomics
Microbiology
QR1-502
spellingShingle core functions
k-mers
networks
phylogenetic analysis
phylogenomics
Microbiology
QR1-502
Guillaume Bernard
Paul Greenfield
Mark A. Ragan
Cheong Xin Chan
<italic toggle="yes">k</italic>-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
description ABSTRACT Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences, and a tree-like structure does not adequately capture how these processes impact microbial physiology. Here we adopted alignment-free approaches based on k-mer statistics to infer phylogenomic networks involving 2,783 completely sequenced bacterial and archaeal genomes and compared the contributions of rRNA, protein-coding, and plasmid sequences to these networks. Our results show that the phylogenomic signal arising from ribosomal RNAs is strong and extends broadly across all taxa, whereas that from plasmids is strong but restricted to closely related groups, particularly Proteobacteria. However, the signal from the other chromosomal regions is restricted in breadth. We show that mean k-mer similarity can correlate with taxonomic rank. We also link the implicated k-mers to genome annotation (thus, functions) and define core k-mers (thus, core functions) in specific phyletic groups. Highly conserved functions in most phyla include amino acid metabolism and transport as well as energy production and conversion. Intracellular trafficking and secretion are the most prominent core functions among Spirochaetes, whereas energy production and conversion are not highly conserved among the largely parasitic or commensal Tenericutes. These observations suggest that differential conservation of functions relates to niche specialization and evolutionary diversification of microbes. Our results demonstrate that k-mer approaches can be used to efficiently identify phylogenomic signals and conserved core functions at the multigenome scale. IMPORTANCE Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly Proteobacteria. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.
format article
author Guillaume Bernard
Paul Greenfield
Mark A. Ragan
Cheong Xin Chan
author_facet Guillaume Bernard
Paul Greenfield
Mark A. Ragan
Cheong Xin Chan
author_sort Guillaume Bernard
title <italic toggle="yes">k</italic>-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_short <italic toggle="yes">k</italic>-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_full <italic toggle="yes">k</italic>-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_fullStr <italic toggle="yes">k</italic>-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_full_unstemmed <italic toggle="yes">k</italic>-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
title_sort <italic toggle="yes">k</italic>-mer similarity, networks of microbial genomes, and taxonomic rank
publisher American Society for Microbiology
publishDate 2018
url https://doaj.org/article/31596908f9ce4e728d8f4a5919457e6f
work_keys_str_mv AT guillaumebernard italictoggleyeskitalicmersimilaritynetworksofmicrobialgenomesandtaxonomicrank
AT paulgreenfield italictoggleyeskitalicmersimilaritynetworksofmicrobialgenomesandtaxonomicrank
AT markaragan italictoggleyeskitalicmersimilaritynetworksofmicrobialgenomesandtaxonomicrank
AT cheongxinchan italictoggleyeskitalicmersimilaritynetworksofmicrobialgenomesandtaxonomicrank
_version_ 1718377730617114624