A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study

Critical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene fu...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jujuan Zhuang, Changjing Ren, Dan Ren, Yu’ang Li, Danyang Liu, Lingyu Cui, Geng Tian, Jiasheng Yang, Jingbo Liu
Formato: article
Lenguaje:EN
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://doaj.org/article/4aff72a7f2aa4bb78055c14efbf49912
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:4aff72a7f2aa4bb78055c14efbf49912
record_format dspace
spelling oai:doaj.org-article:4aff72a7f2aa4bb78055c14efbf499122021-12-01T19:26:32ZA Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study2234-943X10.3389/fonc.2021.797057https://doaj.org/article/4aff72a7f2aa4bb78055c14efbf499122021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fonc.2021.797057/fullhttps://doaj.org/toc/2234-943XCritical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene functions and gene interactions. In this study, we propose a feature extraction method, named FEGFS, to analyze scRNA-seq data, taking advantage of known gene functions. Specifically, we first derive the functional gene sets based on Gene Ontology (GO) terms and reduce their redundancy by semantic similarity analysis and gene repetitive rate reduction. Then, we apply the kernel principal component analysis to select features on each non-redundant functional gene set, and we combine the selected features (for each functional gene set) together for subsequent clustering analysis. To test the performance of FEGFS, we apply agglomerative hierarchical clustering based on FEGFS and compared it with seven state-of-the-art clustering methods on six real scRNA-seq datasets. For small datasets like Pollen and Goolam, FEGFS outperforms all methods on all four evaluation metrics including adjusted Rand index (ARI), normalized mutual information (NMI), homogeneity score (HOM), and completeness score (COM). For example, the ARIs of FEGFS are 0.955 and 0.910, respectively, on Pollen and Goolam; and those of the second-best method are only 0.938 and 0.910, respectively. For large datasets, FEGFS also outperforms most methods. For example, the ARIs of FEGFS are 0.781 on both Klein and Zeisel, which are higher than those of all other methods but slight lower than those of SC3 (0.798 and 0.807, respectively). Moreover, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation and in inferring cell lineage trajectories. As for application, take glioma as an example; we demonstrated that our clustering methods could identify important cell clusters related to glioma and also inferred key marker genes related to these cell clusters.Jujuan ZhuangChangjing RenDan RenYu’ang LiDanyang LiuLingyu CuiGeng TianJiasheng YangJingbo LiuFrontiers Media S.A.articlesingle-cell RNA sequencingGO enrichment analysisKPCAsemantic similarity analysisGene OntologyNeoplasms. Tumors. Oncology. Including cancer and carcinogensRC254-282ENFrontiers in Oncology, Vol 11 (2021)
institution DOAJ
collection DOAJ
language EN
topic single-cell RNA sequencing
GO enrichment analysis
KPCA
semantic similarity analysis
Gene Ontology
Neoplasms. Tumors. Oncology. Including cancer and carcinogens
RC254-282
spellingShingle single-cell RNA sequencing
GO enrichment analysis
KPCA
semantic similarity analysis
Gene Ontology
Neoplasms. Tumors. Oncology. Including cancer and carcinogens
RC254-282
Jujuan Zhuang
Changjing Ren
Dan Ren
Yu’ang Li
Danyang Liu
Lingyu Cui
Geng Tian
Jiasheng Yang
Jingbo Liu
A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
description Critical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene functions and gene interactions. In this study, we propose a feature extraction method, named FEGFS, to analyze scRNA-seq data, taking advantage of known gene functions. Specifically, we first derive the functional gene sets based on Gene Ontology (GO) terms and reduce their redundancy by semantic similarity analysis and gene repetitive rate reduction. Then, we apply the kernel principal component analysis to select features on each non-redundant functional gene set, and we combine the selected features (for each functional gene set) together for subsequent clustering analysis. To test the performance of FEGFS, we apply agglomerative hierarchical clustering based on FEGFS and compared it with seven state-of-the-art clustering methods on six real scRNA-seq datasets. For small datasets like Pollen and Goolam, FEGFS outperforms all methods on all four evaluation metrics including adjusted Rand index (ARI), normalized mutual information (NMI), homogeneity score (HOM), and completeness score (COM). For example, the ARIs of FEGFS are 0.955 and 0.910, respectively, on Pollen and Goolam; and those of the second-best method are only 0.938 and 0.910, respectively. For large datasets, FEGFS also outperforms most methods. For example, the ARIs of FEGFS are 0.781 on both Klein and Zeisel, which are higher than those of all other methods but slight lower than those of SC3 (0.798 and 0.807, respectively). Moreover, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation and in inferring cell lineage trajectories. As for application, take glioma as an example; we demonstrated that our clustering methods could identify important cell clusters related to glioma and also inferred key marker genes related to these cell clusters.
format article
author Jujuan Zhuang
Changjing Ren
Dan Ren
Yu’ang Li
Danyang Liu
Lingyu Cui
Geng Tian
Jiasheng Yang
Jingbo Liu
author_facet Jujuan Zhuang
Changjing Ren
Dan Ren
Yu’ang Li
Danyang Liu
Lingyu Cui
Geng Tian
Jiasheng Yang
Jingbo Liu
author_sort Jujuan Zhuang
title A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_short A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_full A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_fullStr A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_full_unstemmed A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_sort novel single-cell rna sequencing data feature extraction method based on gene function analysis and its applications in glioma study
publisher Frontiers Media S.A.
publishDate 2021
url https://doaj.org/article/4aff72a7f2aa4bb78055c14efbf49912
work_keys_str_mv AT jujuanzhuang anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT changjingren anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT danren anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT yuangli anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT danyangliu anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT lingyucui anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT gengtian anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT jiashengyang anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT jingboliu anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT jujuanzhuang novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT changjingren novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT danren novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT yuangli novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT danyangliu novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT lingyucui novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT gengtian novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT jiashengyang novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT jingboliu novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
_version_ 1718404629231828992