GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
ABSTRACT The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014...
Guardado en:
Autores principales: | , , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
American Society for Microbiology
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/09e70e6b96ff4bbf910d56d8c91f3121 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:09e70e6b96ff4bbf910d56d8c91f3121 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:09e70e6b96ff4bbf910d56d8c91f31212021-12-02T18:21:19ZGAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses10.1128/mSystems.01305-202379-5077https://doaj.org/article/09e70e6b96ff4bbf910d56d8c91f31212021-04-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.01305-20https://doaj.org/toc/2379-5077ABSTRACT The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated P. aeruginosa studies, three times more than previously available. GAPE analysis revealed that PA3923, a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that PA3923 mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. IMPORTANCE GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on P. aeruginosa and E. coli compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species. Author Video: An author video summary of this article is available.Zhongyou LiKatja KoeppenVictoria I. HoldenSamuel L. NeffLiviu CengherElora G. DemersDallas L. MouldBruce A. StantonThomas H. HamptonAmerican Society for MicrobiologyarticlePseudomonas aeruginosabiofilmsbioinformaticsgene expressiongenomicsMicrobiologyQR1-502ENmSystems, Vol 6, Iss 2 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Pseudomonas aeruginosa biofilms bioinformatics gene expression genomics Microbiology QR1-502 |
spellingShingle |
Pseudomonas aeruginosa biofilms bioinformatics gene expression genomics Microbiology QR1-502 Zhongyou Li Katja Koeppen Victoria I. Holden Samuel L. Neff Liviu Cengher Elora G. Demers Dallas L. Mould Bruce A. Stanton Thomas H. Hampton GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses |
description |
ABSTRACT The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated P. aeruginosa studies, three times more than previously available. GAPE analysis revealed that PA3923, a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that PA3923 mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. IMPORTANCE GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on P. aeruginosa and E. coli compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species. Author Video: An author video summary of this article is available. |
format |
article |
author |
Zhongyou Li Katja Koeppen Victoria I. Holden Samuel L. Neff Liviu Cengher Elora G. Demers Dallas L. Mould Bruce A. Stanton Thomas H. Hampton |
author_facet |
Zhongyou Li Katja Koeppen Victoria I. Holden Samuel L. Neff Liviu Cengher Elora G. Demers Dallas L. Mould Bruce A. Stanton Thomas H. Hampton |
author_sort |
Zhongyou Li |
title |
GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses |
title_short |
GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses |
title_full |
GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses |
title_fullStr |
GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses |
title_full_unstemmed |
GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses |
title_sort |
gauge-annotated microbial transcriptomic data facilitate parallel mining and high-throughput reanalysis to form data-driven hypotheses |
publisher |
American Society for Microbiology |
publishDate |
2021 |
url |
https://doaj.org/article/09e70e6b96ff4bbf910d56d8c91f3121 |
work_keys_str_mv |
AT zhongyouli gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses AT katjakoeppen gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses AT victoriaiholden gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses AT samuellneff gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses AT liviucengher gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses AT eloragdemers gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses AT dallaslmould gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses AT bruceastanton gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses AT thomashhampton gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses |
_version_ |
1718378136973869056 |