Band-based similarity indices for gene expression classification and clustering

Abstract The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensio...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Aurora Torrente
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/c5acaabaf6e3406d8823a78ff9f40487
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:c5acaabaf6e3406d8823a78ff9f40487
record_format dspace
spelling oai:doaj.org-article:c5acaabaf6e3406d8823a78ff9f404872021-11-08T10:55:53ZBand-based similarity indices for gene expression classification and clustering10.1038/s41598-021-00678-92045-2322https://doaj.org/article/c5acaabaf6e3406d8823a78ff9f404872021-11-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-00678-9https://doaj.org/toc/2045-2322Abstract The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance.Aurora TorrenteNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-18 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Aurora Torrente
Band-based similarity indices for gene expression classification and clustering
description Abstract The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance.
format article
author Aurora Torrente
author_facet Aurora Torrente
author_sort Aurora Torrente
title Band-based similarity indices for gene expression classification and clustering
title_short Band-based similarity indices for gene expression classification and clustering
title_full Band-based similarity indices for gene expression classification and clustering
title_fullStr Band-based similarity indices for gene expression classification and clustering
title_full_unstemmed Band-based similarity indices for gene expression classification and clustering
title_sort band-based similarity indices for gene expression classification and clustering
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/c5acaabaf6e3406d8823a78ff9f40487
work_keys_str_mv AT auroratorrente bandbasedsimilarityindicesforgeneexpressionclassificationandclustering
_version_ 1718442564828266496