Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices.

In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA cop...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Cassio P de Campos, Paola M V Rancoita, Ivo Kwee, Emanuele Zucca, Marco Zaffalon, Francesco Bertoni
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2013
Materias:
R
Q
Acceso en línea:https://doaj.org/article/3b3b6f275b3c4c6bbcf1e0907df737e6
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:3b3b6f275b3c4c6bbcf1e0907df737e6
record_format dspace
spelling oai:doaj.org-article:3b3b6f275b3c4c6bbcf1e0907df737e62021-11-18T08:45:33ZDiscovering subgroups of patients from DNA copy number data using NMF on compacted matrices.1932-620310.1371/journal.pone.0079720https://doaj.org/article/3b3b6f275b3c4c6bbcf1e0907df737e62013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24278162/?tool=EBIhttps://doaj.org/toc/1932-6203In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.Cassio P de CamposPaola M V RancoitaIvo KweeEmanuele ZuccaMarco ZaffalonFrancesco BertoniPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 8, Iss 11, p e79720 (2013)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Cassio P de Campos
Paola M V Rancoita
Ivo Kwee
Emanuele Zucca
Marco Zaffalon
Francesco Bertoni
Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices.
description In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.
format article
author Cassio P de Campos
Paola M V Rancoita
Ivo Kwee
Emanuele Zucca
Marco Zaffalon
Francesco Bertoni
author_facet Cassio P de Campos
Paola M V Rancoita
Ivo Kwee
Emanuele Zucca
Marco Zaffalon
Francesco Bertoni
author_sort Cassio P de Campos
title Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices.
title_short Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices.
title_full Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices.
title_fullStr Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices.
title_full_unstemmed Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices.
title_sort discovering subgroups of patients from dna copy number data using nmf on compacted matrices.
publisher Public Library of Science (PLoS)
publishDate 2013
url https://doaj.org/article/3b3b6f275b3c4c6bbcf1e0907df737e6
work_keys_str_mv AT cassiopdecampos discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT paolamvrancoita discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT ivokwee discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT emanuelezucca discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT marcozaffalon discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT francescobertoni discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
_version_ 1718421315965157376