Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection
ABSTRACT Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and meta...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
American Society for Microbiology
2020
|
Materias: | |
Acceso en línea: | https://doaj.org/article/bb217ab159a94181903f0f494542302f |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:bb217ab159a94181903f0f494542302f |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:bb217ab159a94181903f0f494542302f2021-12-02T19:46:19ZInterpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection10.1128/mSystems.00230-192379-5077https://doaj.org/article/bb217ab159a94181903f0f494542302f2020-04-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00230-19https://doaj.org/toc/2379-5077ABSTRACT Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization. IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.Thomas P. QuinnIonas ErbAmerican Society for Microbiologyarticlebalancesclassificationcodacompositional datalog contrastlog ratioMicrobiologyQR1-502ENmSystems, Vol 5, Iss 2 (2020) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
balances classification coda compositional data log contrast log ratio Microbiology QR1-502 |
spellingShingle |
balances classification coda compositional data log contrast log ratio Microbiology QR1-502 Thomas P. Quinn Ionas Erb Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection |
description |
ABSTRACT Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization. IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance. |
format |
article |
author |
Thomas P. Quinn Ionas Erb |
author_facet |
Thomas P. Quinn Ionas Erb |
author_sort |
Thomas P. Quinn |
title |
Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection |
title_short |
Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection |
title_full |
Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection |
title_fullStr |
Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection |
title_full_unstemmed |
Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection |
title_sort |
interpretable log contrasts for the classification of health biomarkers: a new approach to balance selection |
publisher |
American Society for Microbiology |
publishDate |
2020 |
url |
https://doaj.org/article/bb217ab159a94181903f0f494542302f |
work_keys_str_mv |
AT thomaspquinn interpretablelogcontrastsfortheclassificationofhealthbiomarkersanewapproachtobalanceselection AT ionaserb interpretablelogcontrastsfortheclassificationofhealthbiomarkersanewapproachtobalanceselection |
_version_ |
1718376047864446976 |