GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses

ABSTRACT The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Zhongyou Li, Katja Koeppen, Victoria I. Holden, Samuel L. Neff, Liviu Cengher, Elora G. Demers, Dallas L. Mould, Bruce A. Stanton, Thomas H. Hampton
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2021
Materias:
Acceso en línea:https://doaj.org/article/09e70e6b96ff4bbf910d56d8c91f3121
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:09e70e6b96ff4bbf910d56d8c91f3121
record_format dspace
spelling oai:doaj.org-article:09e70e6b96ff4bbf910d56d8c91f31212021-12-02T18:21:19ZGAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses10.1128/mSystems.01305-202379-5077https://doaj.org/article/09e70e6b96ff4bbf910d56d8c91f31212021-04-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.01305-20https://doaj.org/toc/2379-5077ABSTRACT The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated P. aeruginosa studies, three times more than previously available. GAPE analysis revealed that PA3923, a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that PA3923 mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. IMPORTANCE GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on P. aeruginosa and E. coli compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species. Author Video: An author video summary of this article is available.Zhongyou LiKatja KoeppenVictoria I. HoldenSamuel L. NeffLiviu CengherElora G. DemersDallas L. MouldBruce A. StantonThomas H. HamptonAmerican Society for MicrobiologyarticlePseudomonas aeruginosabiofilmsbioinformaticsgene expressiongenomicsMicrobiologyQR1-502ENmSystems, Vol 6, Iss 2 (2021)
institution DOAJ
collection DOAJ
language EN
topic Pseudomonas aeruginosa
biofilms
bioinformatics
gene expression
genomics
Microbiology
QR1-502
spellingShingle Pseudomonas aeruginosa
biofilms
bioinformatics
gene expression
genomics
Microbiology
QR1-502
Zhongyou Li
Katja Koeppen
Victoria I. Holden
Samuel L. Neff
Liviu Cengher
Elora G. Demers
Dallas L. Mould
Bruce A. Stanton
Thomas H. Hampton
GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
description ABSTRACT The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated P. aeruginosa studies, three times more than previously available. GAPE analysis revealed that PA3923, a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that PA3923 mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. IMPORTANCE GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on P. aeruginosa and E. coli compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species. Author Video: An author video summary of this article is available.
format article
author Zhongyou Li
Katja Koeppen
Victoria I. Holden
Samuel L. Neff
Liviu Cengher
Elora G. Demers
Dallas L. Mould
Bruce A. Stanton
Thomas H. Hampton
author_facet Zhongyou Li
Katja Koeppen
Victoria I. Holden
Samuel L. Neff
Liviu Cengher
Elora G. Demers
Dallas L. Mould
Bruce A. Stanton
Thomas H. Hampton
author_sort Zhongyou Li
title GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_short GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_full GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_fullStr GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_full_unstemmed GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_sort gauge-annotated microbial transcriptomic data facilitate parallel mining and high-throughput reanalysis to form data-driven hypotheses
publisher American Society for Microbiology
publishDate 2021
url https://doaj.org/article/09e70e6b96ff4bbf910d56d8c91f3121
work_keys_str_mv AT zhongyouli gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT katjakoeppen gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT victoriaiholden gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT samuellneff gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT liviucengher gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT eloragdemers gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT dallaslmould gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT bruceastanton gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT thomashhampton gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
_version_ 1718378136973869056