Unsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies

ABSTRACT Transposon insertion sequencing (TIS) is a widely used technique for conducting genome-scale forward genetic screens in bacteria. However, few methods enable comparison of TIS data across multiple replicates of a screen or across independent screens, including screens performed in different...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Troy P. Hubbard, Jonathan D. D’Gama, Gabriel Billings, Brigid M. Davis, Matthew K. Waldor
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2019
Materias:
PCA
Acceso en línea:https://doaj.org/article/0d1c22ae2db24a2e9bee5b0bd77896b4
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:0d1c22ae2db24a2e9bee5b0bd77896b4
record_format dspace
spelling oai:doaj.org-article:0d1c22ae2db24a2e9bee5b0bd77896b42021-11-15T15:22:05ZUnsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies10.1128/mSphere.00031-192379-5042https://doaj.org/article/0d1c22ae2db24a2e9bee5b0bd77896b42019-02-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSphere.00031-19https://doaj.org/toc/2379-5042ABSTRACT Transposon insertion sequencing (TIS) is a widely used technique for conducting genome-scale forward genetic screens in bacteria. However, few methods enable comparison of TIS data across multiple replicates of a screen or across independent screens, including screens performed in different organisms. Here, we introduce a post hoc analytic framework, comparative TIS (CompTIS), which utilizes unsupervised learning to enable meta-analysis of multiple TIS data sets. CompTIS first implements screen-level principal-component analysis (PCA) and clustering to identify variation between the TIS screens. This initial screen-level analysis facilitates the selection of related screens for additional analyses, reveals the relatedness of complex environments based on growth phenotypes measured by TIS, and provides a useful quality control step. Subsequently, PCA is performed on genes to identify loci whose corresponding mutants lead to concordant/discordant phenotypes across all or in a subset of screens. We used CompTIS to analyze published intestinal colonization TIS data sets from two vibrio species. Gene-level analyses identified both pan-vibrio genes required for intestinal colonization and conserved genes that displayed species-specific requirements. CompTIS is applicable to virtually any combination of TIS screens and can be implemented without regard to either the number of screens or the methods used for upstream data analysis. IMPORTANCE Forward genetic screens are powerful tools for functional genomics. The comparison of similar forward genetic screens performed in different organisms enables the identification of genes with similar or different phenotypes across organisms. Transposon insertion sequencing is a widely used method for conducting genome-scale forward genetic screens in bacteria, yet few bioinformatic approaches have been developed to compare the results of screen replicates and different screens conducted across species or strains. Here, we used principal-component analysis (PCA) and hierarchical clustering, two unsupervised learning approaches, to analyze the relatedness of multiple in vivo screens of pathogenic vibrios. This analytic framework reveals both shared pan-vibrio requirements for intestinal colonization and strain-specific dependencies. Our findings suggest that PCA-based analytics will be a straightforward widely applicable approach for comparing diverse transposon insertion sequencing screens.Troy P. HubbardJonathan D. D’GamaGabriel BillingsBrigid M. DavisMatthew K. WaldorAmerican Society for MicrobiologyarticlePCAhost-pathogen interactionsin vivo screenprincipal-component analysisTn-seqVibrio choleraeMicrobiologyQR1-502ENmSphere, Vol 4, Iss 1 (2019)
institution DOAJ
collection DOAJ
language EN
topic PCA
host-pathogen interactions
in vivo screen
principal-component analysis
Tn-seq
Vibrio cholerae
Microbiology
QR1-502
spellingShingle PCA
host-pathogen interactions
in vivo screen
principal-component analysis
Tn-seq
Vibrio cholerae
Microbiology
QR1-502
Troy P. Hubbard
Jonathan D. D’Gama
Gabriel Billings
Brigid M. Davis
Matthew K. Waldor
Unsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies
description ABSTRACT Transposon insertion sequencing (TIS) is a widely used technique for conducting genome-scale forward genetic screens in bacteria. However, few methods enable comparison of TIS data across multiple replicates of a screen or across independent screens, including screens performed in different organisms. Here, we introduce a post hoc analytic framework, comparative TIS (CompTIS), which utilizes unsupervised learning to enable meta-analysis of multiple TIS data sets. CompTIS first implements screen-level principal-component analysis (PCA) and clustering to identify variation between the TIS screens. This initial screen-level analysis facilitates the selection of related screens for additional analyses, reveals the relatedness of complex environments based on growth phenotypes measured by TIS, and provides a useful quality control step. Subsequently, PCA is performed on genes to identify loci whose corresponding mutants lead to concordant/discordant phenotypes across all or in a subset of screens. We used CompTIS to analyze published intestinal colonization TIS data sets from two vibrio species. Gene-level analyses identified both pan-vibrio genes required for intestinal colonization and conserved genes that displayed species-specific requirements. CompTIS is applicable to virtually any combination of TIS screens and can be implemented without regard to either the number of screens or the methods used for upstream data analysis. IMPORTANCE Forward genetic screens are powerful tools for functional genomics. The comparison of similar forward genetic screens performed in different organisms enables the identification of genes with similar or different phenotypes across organisms. Transposon insertion sequencing is a widely used method for conducting genome-scale forward genetic screens in bacteria, yet few bioinformatic approaches have been developed to compare the results of screen replicates and different screens conducted across species or strains. Here, we used principal-component analysis (PCA) and hierarchical clustering, two unsupervised learning approaches, to analyze the relatedness of multiple in vivo screens of pathogenic vibrios. This analytic framework reveals both shared pan-vibrio requirements for intestinal colonization and strain-specific dependencies. Our findings suggest that PCA-based analytics will be a straightforward widely applicable approach for comparing diverse transposon insertion sequencing screens.
format article
author Troy P. Hubbard
Jonathan D. D’Gama
Gabriel Billings
Brigid M. Davis
Matthew K. Waldor
author_facet Troy P. Hubbard
Jonathan D. D’Gama
Gabriel Billings
Brigid M. Davis
Matthew K. Waldor
author_sort Troy P. Hubbard
title Unsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies
title_short Unsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies
title_full Unsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies
title_fullStr Unsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies
title_full_unstemmed Unsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies
title_sort unsupervised learning approach for comparing multiple transposon insertion sequencing studies
publisher American Society for Microbiology
publishDate 2019
url https://doaj.org/article/0d1c22ae2db24a2e9bee5b0bd77896b4
work_keys_str_mv AT troyphubbard unsupervisedlearningapproachforcomparingmultipletransposoninsertionsequencingstudies
AT jonathanddgama unsupervisedlearningapproachforcomparingmultipletransposoninsertionsequencingstudies
AT gabrielbillings unsupervisedlearningapproachforcomparingmultipletransposoninsertionsequencingstudies
AT brigidmdavis unsupervisedlearningapproachforcomparingmultipletransposoninsertionsequencingstudies
AT matthewkwaldor unsupervisedlearningapproachforcomparingmultipletransposoninsertionsequencingstudies
_version_ 1718428052779696128