A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.

Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fa...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Wen Bo Liu, Sheng Nan Liang, Xi Wen Qin
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/fe5964de855540498082f32ea101742b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:fe5964de855540498082f32ea101742b
record_format dspace
spelling oai:doaj.org-article:fe5964de855540498082f32ea101742b2021-12-02T20:13:42ZA novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.1932-620310.1371/journal.pone.0258326https://doaj.org/article/fe5964de855540498082f32ea101742b2021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0258326https://doaj.org/toc/1932-6203Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fail to improved classification performance. To counter this problem, this paper proposes a dimension-reduction algorithm based on weighted kernel principal component analysis (WKPCA), constructs kernel function weights according to kernel matrix eigenvalues, and combines multiple kernel functions to reduce the feature dimensions. To further improve the dimensional reduction efficiency of WKPCA, t-class kernel functions are constructed, and corresponding theoretical proofs are given. Moreover, the cumulative optimal performance rate is constructed to measure the overall performance of WKPCA combined with machine learning algorithms. Naive Bayes, K-nearest neighbour, random forest, iterative random forest and support vector machine approaches are used in classifiers to analyse 6 real gene expression dataset. Compared with the all-variable model, linear principal component dimension reduction and single kernel function dimension reduction, the results show that the classification performance of the 5 machine learning methods mentioned above can be improved effectively by WKPCA dimension reduction.Wen Bo LiuSheng Nan LiangXi Wen QinPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 10, p e0258326 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Wen Bo Liu
Sheng Nan Liang
Xi Wen Qin
A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.
description Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fail to improved classification performance. To counter this problem, this paper proposes a dimension-reduction algorithm based on weighted kernel principal component analysis (WKPCA), constructs kernel function weights according to kernel matrix eigenvalues, and combines multiple kernel functions to reduce the feature dimensions. To further improve the dimensional reduction efficiency of WKPCA, t-class kernel functions are constructed, and corresponding theoretical proofs are given. Moreover, the cumulative optimal performance rate is constructed to measure the overall performance of WKPCA combined with machine learning algorithms. Naive Bayes, K-nearest neighbour, random forest, iterative random forest and support vector machine approaches are used in classifiers to analyse 6 real gene expression dataset. Compared with the all-variable model, linear principal component dimension reduction and single kernel function dimension reduction, the results show that the classification performance of the 5 machine learning methods mentioned above can be improved effectively by WKPCA dimension reduction.
format article
author Wen Bo Liu
Sheng Nan Liang
Xi Wen Qin
author_facet Wen Bo Liu
Sheng Nan Liang
Xi Wen Qin
author_sort Wen Bo Liu
title A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.
title_short A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.
title_full A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.
title_fullStr A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.
title_full_unstemmed A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.
title_sort novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/fe5964de855540498082f32ea101742b
work_keys_str_mv AT wenboliu anoveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT shengnanliang anoveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT xiwenqin anoveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT wenboliu noveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT shengnanliang noveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT xiwenqin noveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
_version_ 1718374803523502080