Defining the estimated core genome of bacterial populations using a Bayesian decision model.

The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing da...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Andries J van Tonder, Shilan Mistry, James E Bray, Dorothea M C Hill, Alison J Cody, Chris L Farmer, Keith P Klugman, Anne von Gottberg, Stephen D Bentley, Julian Parkhill, Keith A Jolley, Martin C J Maiden, Angela B Brueggemann
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
Acceso en línea:https://doaj.org/article/ecab11f7b0a24086a7f07fcad8d07f2e
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ecab11f7b0a24086a7f07fcad8d07f2e
record_format dspace
spelling oai:doaj.org-article:ecab11f7b0a24086a7f07fcad8d07f2e2021-11-25T05:40:49ZDefining the estimated core genome of bacterial populations using a Bayesian decision model.1553-734X1553-735810.1371/journal.pcbi.1003788https://doaj.org/article/ecab11f7b0a24086a7f07fcad8d07f2e2014-08-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/25144616/pdf/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.Andries J van TonderShilan MistryJames E BrayDorothea M C HillAlison J CodyChris L FarmerKeith P KlugmanAnne von GottbergStephen D BentleyJulian ParkhillKeith A JolleyMartin C J MaidenAngela B BrueggemannPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 10, Iss 8, p e1003788 (2014)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Andries J van Tonder
Shilan Mistry
James E Bray
Dorothea M C Hill
Alison J Cody
Chris L Farmer
Keith P Klugman
Anne von Gottberg
Stephen D Bentley
Julian Parkhill
Keith A Jolley
Martin C J Maiden
Angela B Brueggemann
Defining the estimated core genome of bacterial populations using a Bayesian decision model.
description The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.
format article
author Andries J van Tonder
Shilan Mistry
James E Bray
Dorothea M C Hill
Alison J Cody
Chris L Farmer
Keith P Klugman
Anne von Gottberg
Stephen D Bentley
Julian Parkhill
Keith A Jolley
Martin C J Maiden
Angela B Brueggemann
author_facet Andries J van Tonder
Shilan Mistry
James E Bray
Dorothea M C Hill
Alison J Cody
Chris L Farmer
Keith P Klugman
Anne von Gottberg
Stephen D Bentley
Julian Parkhill
Keith A Jolley
Martin C J Maiden
Angela B Brueggemann
author_sort Andries J van Tonder
title Defining the estimated core genome of bacterial populations using a Bayesian decision model.
title_short Defining the estimated core genome of bacterial populations using a Bayesian decision model.
title_full Defining the estimated core genome of bacterial populations using a Bayesian decision model.
title_fullStr Defining the estimated core genome of bacterial populations using a Bayesian decision model.
title_full_unstemmed Defining the estimated core genome of bacterial populations using a Bayesian decision model.
title_sort defining the estimated core genome of bacterial populations using a bayesian decision model.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/ecab11f7b0a24086a7f07fcad8d07f2e
work_keys_str_mv AT andriesjvantonder definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT shilanmistry definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT jamesebray definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT dorotheamchill definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT alisonjcody definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT chrislfarmer definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT keithpklugman definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT annevongottberg definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT stephendbentley definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT julianparkhill definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT keithajolley definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT martincjmaiden definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT angelabbrueggemann definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
_version_ 1718414553286443008