Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models

Abstract Group A Streptococcus (GAS) is a globally significant bacterial pathogen. The GAS genotyping gold standard characterises the nucleotide variation of emm, which encodes a surface-exposed protein that is recombinogenic and under immune-based selection pressure. Within a supervised learning me...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sean J. Buckley, Robert J. Harvey, Zack Shan
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/a233ddd7968e4ca58c712652c6293b0a
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a233ddd7968e4ca58c712652c6293b0a
record_format dspace
spelling oai:doaj.org-article:a233ddd7968e4ca58c712652c6293b0a2021-12-02T17:39:53ZApplication of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models10.1038/s41598-021-91941-62045-2322https://doaj.org/article/a233ddd7968e4ca58c712652c6293b0a2021-06-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-91941-6https://doaj.org/toc/2045-2322Abstract Group A Streptococcus (GAS) is a globally significant bacterial pathogen. The GAS genotyping gold standard characterises the nucleotide variation of emm, which encodes a surface-exposed protein that is recombinogenic and under immune-based selection pressure. Within a supervised learning methodology, we tested three random forest (RF) algorithms (Guided, Ordinary, and Regularized) and 53 GAS response regulator (RR) allele types to infer six genomic traits (emm-type, emm-subtype, tissue and country of sample, clinical outcomes, and isolate invasiveness). The Guided, Ordinary, and Regularized RF classifiers inferred the emm-type with accuracies of 96.7%, 95.7%, and 95.2%, using ten, three, and four RR alleles in the feature set, respectively. Notably, we inferred the emm-type with 93.7% accuracy using only mga2 and lrp. We demonstrated a utility for inferring emm-subtype (89.9%), country (88.6%), invasiveness (84.7%), but not clinical (56.9%), or tissue (56.4%), which is consistent with the complexity of GAS pathophysiology. We identified a novel cell wall-spanning domain (SF5), and proposed evolutionary pathways depicting the ‘contrariwise’ and ‘likewise’ chimeric deletion-fusion of emm and enn. We identified an intermediate strain, which provides evidence of the time-dependent excision of mga regulon genes. Overall, our workflow advances the understanding of the GAS mga regulon and its plasticity.Sean J. BuckleyRobert J. HarveyZack ShanNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-14 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Sean J. Buckley
Robert J. Harvey
Zack Shan
Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models
description Abstract Group A Streptococcus (GAS) is a globally significant bacterial pathogen. The GAS genotyping gold standard characterises the nucleotide variation of emm, which encodes a surface-exposed protein that is recombinogenic and under immune-based selection pressure. Within a supervised learning methodology, we tested three random forest (RF) algorithms (Guided, Ordinary, and Regularized) and 53 GAS response regulator (RR) allele types to infer six genomic traits (emm-type, emm-subtype, tissue and country of sample, clinical outcomes, and isolate invasiveness). The Guided, Ordinary, and Regularized RF classifiers inferred the emm-type with accuracies of 96.7%, 95.7%, and 95.2%, using ten, three, and four RR alleles in the feature set, respectively. Notably, we inferred the emm-type with 93.7% accuracy using only mga2 and lrp. We demonstrated a utility for inferring emm-subtype (89.9%), country (88.6%), invasiveness (84.7%), but not clinical (56.9%), or tissue (56.4%), which is consistent with the complexity of GAS pathophysiology. We identified a novel cell wall-spanning domain (SF5), and proposed evolutionary pathways depicting the ‘contrariwise’ and ‘likewise’ chimeric deletion-fusion of emm and enn. We identified an intermediate strain, which provides evidence of the time-dependent excision of mga regulon genes. Overall, our workflow advances the understanding of the GAS mga regulon and its plasticity.
format article
author Sean J. Buckley
Robert J. Harvey
Zack Shan
author_facet Sean J. Buckley
Robert J. Harvey
Zack Shan
author_sort Sean J. Buckley
title Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models
title_short Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models
title_full Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models
title_fullStr Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models
title_full_unstemmed Application of the random forest algorithm to Streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models
title_sort application of the random forest algorithm to streptococcus pyogenes response regulator allele variation: from machine learning to evolutionary models
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/a233ddd7968e4ca58c712652c6293b0a
work_keys_str_mv AT seanjbuckley applicationoftherandomforestalgorithmtostreptococcuspyogenesresponseregulatorallelevariationfrommachinelearningtoevolutionarymodels
AT robertjharvey applicationoftherandomforestalgorithmtostreptococcuspyogenesresponseregulatorallelevariationfrommachinelearningtoevolutionarymodels
AT zackshan applicationoftherandomforestalgorithmtostreptococcuspyogenesresponseregulatorallelevariationfrommachinelearningtoevolutionarymodels
_version_ 1718379780042129408