The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data

ABSTRACT Exploratory analysis of human microbiome data is often based on dimension-reduced graphical displays derived from similarities based on non-Euclidean distances, such as UniFrac or Bray-Curtis. However, a display of this type, often referred to as the principal-coordinate analysis (PCoA) plo...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Yue Wang, Timothy W. Randolph, Ali Shojaie, Jing Ma
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2019
Materias:
Acceso en línea:https://doaj.org/article/3489565ccd9f4c14919143cfbc2abe81
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:3489565ccd9f4c14919143cfbc2abe81
record_format dspace
spelling oai:doaj.org-article:3489565ccd9f4c14919143cfbc2abe812021-12-02T18:15:44ZThe Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data10.1128/mSystems.00504-192379-5077https://doaj.org/article/3489565ccd9f4c14919143cfbc2abe812019-12-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00504-19https://doaj.org/toc/2379-5077ABSTRACT Exploratory analysis of human microbiome data is often based on dimension-reduced graphical displays derived from similarities based on non-Euclidean distances, such as UniFrac or Bray-Curtis. However, a display of this type, often referred to as the principal-coordinate analysis (PCoA) plot, does not reveal which taxa are related to the observed clustering because the configuration of samples is not based on a coordinate system in which both the samples and variables can be represented. The reason is that the PCoA plot is based on the eigen-decomposition of a similarity matrix and not the singular value decomposition (SVD) of the sample-by-abundance matrix. We propose a novel biplot that is based on an extension of the SVD, called the generalized matrix decomposition biplot (GMD-biplot), which involves an arbitrary matrix of similarities and the original matrix of variable measures, such as taxon abundances. As in a traditional biplot, points represent the samples, and arrows represent the variables. The proposed GMD-biplot is illustrated by analyzing multiple real and simulated data sets which demonstrate that the GMD-biplot provides improved clustering capability and a more meaningful relationship between the arrows and points. IMPORTANCE Biplots that simultaneously display the sample clustering and the important taxa have gained popularity in the exploratory analysis of human microbiome data. Traditional biplots, assuming Euclidean distances between samples, are not appropriate for microbiome data, when non-Euclidean distances are used to characterize dissimilarities among microbial communities. Thus, incorporating information from non-Euclidean distances into a biplot becomes useful for graphical displays of microbiome data. The proposed GMD-biplot accounts for any arbitrary non-Euclidean distances and provides a robust and computationally efficient approach for graphical visualization of microbiome data. In addition, the proposed GMD-biplot displays both the samples and taxa with respect to the same coordinate system, which further allows the configuration of future samples.Yue WangTimothy W. RandolphAli ShojaieJing MaAmerican Society for Microbiologyarticledata visualizationclusteringdimension reductionstructured datanon-Euclidean distancesMicrobiologyQR1-502ENmSystems, Vol 4, Iss 6 (2019)
institution DOAJ
collection DOAJ
language EN
topic data visualization
clustering
dimension reduction
structured data
non-Euclidean distances
Microbiology
QR1-502
spellingShingle data visualization
clustering
dimension reduction
structured data
non-Euclidean distances
Microbiology
QR1-502
Yue Wang
Timothy W. Randolph
Ali Shojaie
Jing Ma
The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data
description ABSTRACT Exploratory analysis of human microbiome data is often based on dimension-reduced graphical displays derived from similarities based on non-Euclidean distances, such as UniFrac or Bray-Curtis. However, a display of this type, often referred to as the principal-coordinate analysis (PCoA) plot, does not reveal which taxa are related to the observed clustering because the configuration of samples is not based on a coordinate system in which both the samples and variables can be represented. The reason is that the PCoA plot is based on the eigen-decomposition of a similarity matrix and not the singular value decomposition (SVD) of the sample-by-abundance matrix. We propose a novel biplot that is based on an extension of the SVD, called the generalized matrix decomposition biplot (GMD-biplot), which involves an arbitrary matrix of similarities and the original matrix of variable measures, such as taxon abundances. As in a traditional biplot, points represent the samples, and arrows represent the variables. The proposed GMD-biplot is illustrated by analyzing multiple real and simulated data sets which demonstrate that the GMD-biplot provides improved clustering capability and a more meaningful relationship between the arrows and points. IMPORTANCE Biplots that simultaneously display the sample clustering and the important taxa have gained popularity in the exploratory analysis of human microbiome data. Traditional biplots, assuming Euclidean distances between samples, are not appropriate for microbiome data, when non-Euclidean distances are used to characterize dissimilarities among microbial communities. Thus, incorporating information from non-Euclidean distances into a biplot becomes useful for graphical displays of microbiome data. The proposed GMD-biplot accounts for any arbitrary non-Euclidean distances and provides a robust and computationally efficient approach for graphical visualization of microbiome data. In addition, the proposed GMD-biplot displays both the samples and taxa with respect to the same coordinate system, which further allows the configuration of future samples.
format article
author Yue Wang
Timothy W. Randolph
Ali Shojaie
Jing Ma
author_facet Yue Wang
Timothy W. Randolph
Ali Shojaie
Jing Ma
author_sort Yue Wang
title The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data
title_short The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data
title_full The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data
title_fullStr The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data
title_full_unstemmed The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data
title_sort generalized matrix decomposition biplot and its application to microbiome data
publisher American Society for Microbiology
publishDate 2019
url https://doaj.org/article/3489565ccd9f4c14919143cfbc2abe81
work_keys_str_mv AT yuewang thegeneralizedmatrixdecompositionbiplotanditsapplicationtomicrobiomedata
AT timothywrandolph thegeneralizedmatrixdecompositionbiplotanditsapplicationtomicrobiomedata
AT alishojaie thegeneralizedmatrixdecompositionbiplotanditsapplicationtomicrobiomedata
AT jingma thegeneralizedmatrixdecompositionbiplotanditsapplicationtomicrobiomedata
AT yuewang generalizedmatrixdecompositionbiplotanditsapplicationtomicrobiomedata
AT timothywrandolph generalizedmatrixdecompositionbiplotanditsapplicationtomicrobiomedata
AT alishojaie generalizedmatrixdecompositionbiplotanditsapplicationtomicrobiomedata
AT jingma generalizedmatrixdecompositionbiplotanditsapplicationtomicrobiomedata
_version_ 1718378327102717952