Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples

Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a maj...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Thanh M. Nguyen, Samuel Bharti, Zongliang Yue, Christopher D. Willey, Jake Y. Chen
Formato: article
Lenguaje:EN
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://doaj.org/article/f51a8838d56f47ff861e6f335e441693
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f51a8838d56f47ff861e6f335e441693
record_format dspace
spelling oai:doaj.org-article:f51a8838d56f47ff861e6f335e4416932021-11-12T12:02:45ZStatistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples2624-909X10.3389/fdata.2021.725276https://doaj.org/article/f51a8838d56f47ff861e6f335e4416932021-09-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fdata.2021.725276/fullhttps://doaj.org/toc/2624-909XUnsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a major challenge. Here, we describe a powerful analytical method called Statistical Enrichment Analysis of Samples (SEAS) for interpreting clustered or embedded sample data from omics studies. The method derives its power by focusing on sample sets, i.e., groups of biological samples that were constructed for various purposes, e.g., manual curation of samples sharing specific characteristics or automated clusters generated by embedding sample omic profiles from multi-dimensional omics space. The samples in the sample set share common clinical measurements, which we refer to as “clinotypes,” such as age group, gender, treatment status, or survival days. We demonstrate how SEAS yields insights into biological data sets using glioblastoma (GBM) samples. Notably, when analyzing the combined The Cancer Genome Atlas (TCGA)—patient-derived xenograft (PDX) data, SEAS allows approximating the different clinical outcomes of radiotherapy-treated PDX samples, which has not been solved by other tools. The result shows that SEAS may support the clinical decision. The SEAS tool is publicly available as a freely available software package at https://aimed-lab.shinyapps.io/SEAS/.Thanh M. NguyenSamuel BhartiZongliang YueChristopher D. WilleyJake Y. ChenFrontiers Media S.A.articlesample enrichment analysisclinotypeSEASglioblastoma multiformepatient-derived xenograftpatient-derived xenograftInformation technologyT58.5-58.64ENFrontiers in Big Data, Vol 4 (2021)
institution DOAJ
collection DOAJ
language EN
topic sample enrichment analysis
clinotype
SEAS
glioblastoma multiforme
patient-derived xenograft
patient-derived xenograft
Information technology
T58.5-58.64
spellingShingle sample enrichment analysis
clinotype
SEAS
glioblastoma multiforme
patient-derived xenograft
patient-derived xenograft
Information technology
T58.5-58.64
Thanh M. Nguyen
Samuel Bharti
Zongliang Yue
Christopher D. Willey
Jake Y. Chen
Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
description Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a major challenge. Here, we describe a powerful analytical method called Statistical Enrichment Analysis of Samples (SEAS) for interpreting clustered or embedded sample data from omics studies. The method derives its power by focusing on sample sets, i.e., groups of biological samples that were constructed for various purposes, e.g., manual curation of samples sharing specific characteristics or automated clusters generated by embedding sample omic profiles from multi-dimensional omics space. The samples in the sample set share common clinical measurements, which we refer to as “clinotypes,” such as age group, gender, treatment status, or survival days. We demonstrate how SEAS yields insights into biological data sets using glioblastoma (GBM) samples. Notably, when analyzing the combined The Cancer Genome Atlas (TCGA)—patient-derived xenograft (PDX) data, SEAS allows approximating the different clinical outcomes of radiotherapy-treated PDX samples, which has not been solved by other tools. The result shows that SEAS may support the clinical decision. The SEAS tool is publicly available as a freely available software package at https://aimed-lab.shinyapps.io/SEAS/.
format article
author Thanh M. Nguyen
Samuel Bharti
Zongliang Yue
Christopher D. Willey
Jake Y. Chen
author_facet Thanh M. Nguyen
Samuel Bharti
Zongliang Yue
Christopher D. Willey
Jake Y. Chen
author_sort Thanh M. Nguyen
title Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_short Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_full Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_fullStr Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_full_unstemmed Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_sort statistical enrichment analysis of samples: a general-purpose tool to annotate metadata neighborhoods of biological samples
publisher Frontiers Media S.A.
publishDate 2021
url https://doaj.org/article/f51a8838d56f47ff861e6f335e441693
work_keys_str_mv AT thanhmnguyen statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples
AT samuelbharti statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples
AT zongliangyue statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples
AT christopherdwilley statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples
AT jakeychen statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples
_version_ 1718430592604831744