Machine learning to predict the source of campylobacteriosis using whole genome data.

Campylobacteriosis is among the world's most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat,...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Nicolas Arning, Samuel K Sheppard, Sion Bayliss, David A Clifton, Daniel J Wilson
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2021
Materias:	Genetics QH426-470
Acceso en línea:	https://doaj.org/article/cc79937e580e4d55b76229528d8bdca3
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:cc79937e580e4d55b76229528d8bdca3
record_format	dspace
spelling	oai:doaj.org-article:cc79937e580e4d55b76229528d8bdca32021-12-02T20:03:31ZMachine learning to predict the source of campylobacteriosis using whole genome data.1553-73901553-740410.1371/journal.pgen.1009436https://doaj.org/article/cc79937e580e4d55b76229528d8bdca32021-10-01T00:00:00Zhttps://doi.org/10.1371/journal.pgen.1009436https://doaj.org/toc/1553-7390https://doaj.org/toc/1553-7404Campylobacteriosis is among the world's most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat, poultry, and drinking water. Strain variation has allowed source tracking based upon allelic variation in multi-locus sequence typing (MLST) genes allowing isolates from infected individuals to be attributed to specific animal or environmental reservoirs. However, the accuracy of probabilistic attribution models has been limited by the ability to differentiate isolates based upon just 7 MLST genes. Here, we broaden the input data spectrum to include core genome MLST (cgMLST) and whole genome sequences (WGS), and implement multiple machine learning algorithms, allowing more accurate source attribution. We increase attribution accuracy from 64% using the standard iSource population genetic approach to 71% for MLST, 85% for cgMLST and 78% for kmerized WGS data using the classifier we named aiSource. To gain insight beyond the source model prediction, we use Bayesian inference to analyse the relative affinity of C. jejuni strains to infect humans and identified potential differences, in source-human transmission ability among clonally related isolates in the most common disease causing lineage (ST-21 clonal complex). Providing generalizable computationally efficient methods, based upon machine learning and population genetics, we provide a scalable approach to global disease surveillance that can continuously incorporate novel samples for source attribution and identify fine-scale variation in transmission potential.Nicolas ArningSamuel K SheppardSion BaylissDavid A CliftonDaniel J WilsonPublic Library of Science (PLoS)articleGeneticsQH426-470ENPLoS Genetics, Vol 17, Iss 10, p e1009436 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Genetics QH426-470
spellingShingle	Genetics QH426-470 Nicolas Arning Samuel K Sheppard Sion Bayliss David A Clifton Daniel J Wilson Machine learning to predict the source of campylobacteriosis using whole genome data.
description	Campylobacteriosis is among the world's most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat, poultry, and drinking water. Strain variation has allowed source tracking based upon allelic variation in multi-locus sequence typing (MLST) genes allowing isolates from infected individuals to be attributed to specific animal or environmental reservoirs. However, the accuracy of probabilistic attribution models has been limited by the ability to differentiate isolates based upon just 7 MLST genes. Here, we broaden the input data spectrum to include core genome MLST (cgMLST) and whole genome sequences (WGS), and implement multiple machine learning algorithms, allowing more accurate source attribution. We increase attribution accuracy from 64% using the standard iSource population genetic approach to 71% for MLST, 85% for cgMLST and 78% for kmerized WGS data using the classifier we named aiSource. To gain insight beyond the source model prediction, we use Bayesian inference to analyse the relative affinity of C. jejuni strains to infect humans and identified potential differences, in source-human transmission ability among clonally related isolates in the most common disease causing lineage (ST-21 clonal complex). Providing generalizable computationally efficient methods, based upon machine learning and population genetics, we provide a scalable approach to global disease surveillance that can continuously incorporate novel samples for source attribution and identify fine-scale variation in transmission potential.
format	article
author	Nicolas Arning Samuel K Sheppard Sion Bayliss David A Clifton Daniel J Wilson
author_facet	Nicolas Arning Samuel K Sheppard Sion Bayliss David A Clifton Daniel J Wilson
author_sort	Nicolas Arning
title	Machine learning to predict the source of campylobacteriosis using whole genome data.
title_short	Machine learning to predict the source of campylobacteriosis using whole genome data.
title_full	Machine learning to predict the source of campylobacteriosis using whole genome data.
title_fullStr	Machine learning to predict the source of campylobacteriosis using whole genome data.
title_full_unstemmed	Machine learning to predict the source of campylobacteriosis using whole genome data.
title_sort	machine learning to predict the source of campylobacteriosis using whole genome data.
publisher	Public Library of Science (PLoS)
publishDate	2021
url	https://doaj.org/article/cc79937e580e4d55b76229528d8bdca3
work_keys_str_mv	AT nicolasarning machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata AT samuelksheppard machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata AT sionbayliss machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata AT davidaclifton machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata AT danieljwilson machinelearningtopredictthesourceofcampylobacteriosisusingwholegenomedata
_version_	1718375687030571008

Machine learning to predict the source of campylobacteriosis using whole genome data.

Ejemplares similares