SlimPLS: a method for feature selection in gene expression-based disease classification.

A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Michael Gutkin, Ron Shamir, Gideon Dror
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2009
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/f60a986aeaf14bd5a72e9be2b371e2a2
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:f60a986aeaf14bd5a72e9be2b371e2a2
record_format	dspace
spelling	oai:doaj.org-article:f60a986aeaf14bd5a72e9be2b371e2a22021-11-25T06:21:20ZSlimPLS: a method for feature selection in gene expression-based disease classification.1932-620310.1371/journal.pone.0006416https://doaj.org/article/f60a986aeaf14bd5a72e9be2b371e2a22009-07-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19649265/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method's variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique.Michael GutkinRon ShamirGideon DrorPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 4, Iss 7, p e6416 (2009)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Michael Gutkin Ron Shamir Gideon Dror SlimPLS: a method for feature selection in gene expression-based disease classification.
description	A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method's variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique.
format	article
author	Michael Gutkin Ron Shamir Gideon Dror
author_facet	Michael Gutkin Ron Shamir Gideon Dror
author_sort	Michael Gutkin
title	SlimPLS: a method for feature selection in gene expression-based disease classification.
title_short	SlimPLS: a method for feature selection in gene expression-based disease classification.
title_full	SlimPLS: a method for feature selection in gene expression-based disease classification.
title_fullStr	SlimPLS: a method for feature selection in gene expression-based disease classification.
title_full_unstemmed	SlimPLS: a method for feature selection in gene expression-based disease classification.
title_sort	slimpls: a method for feature selection in gene expression-based disease classification.
publisher	Public Library of Science (PLoS)
publishDate	2009
url	https://doaj.org/article/f60a986aeaf14bd5a72e9be2b371e2a2
work_keys_str_mv	AT michaelgutkin slimplsamethodforfeatureselectioningeneexpressionbaseddiseaseclassification AT ronshamir slimplsamethodforfeatureselectioningeneexpressionbaseddiseaseclassification AT gideondror slimplsamethodforfeatureselectioningeneexpressionbaseddiseaseclassification
_version_	1718413839537537024

SlimPLS: a method for feature selection in gene expression-based disease classification.

Ejemplares similares