scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets

In gene expression profiling studies, including single-cell RNA sequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Qianqian Song, Jing Su, Lance D. Miller, Wei Zhang
Formato: article
Lenguaje:EN
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://doaj.org/article/e9b4bee5e52b4358ac8765ba2fdd26c3
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e9b4bee5e52b4358ac8765ba2fdd26c3
record_format dspace
spelling oai:doaj.org-article:e9b4bee5e52b4358ac8765ba2fdd26c32021-11-16T04:09:19ZscLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets1672-022910.1016/j.gpb.2020.09.002https://doaj.org/article/e9b4bee5e52b4358ac8765ba2fdd26c32021-04-01T00:00:00Zhttp://www.sciencedirect.com/science/article/pii/S167202292030142Xhttps://doaj.org/toc/1672-0229In gene expression profiling studies, including single-cell RNA sequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co-clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM.Qianqian SongJing SuLance D. MillerWei ZhangElsevierarticleSingle-cell RNA sequencingConsensus clusteringLatent spaceMarkov Chain Monte CarloMaximum likelihood approachBiology (General)QH301-705.5ENGenomics, Proteomics & Bioinformatics, Vol 19, Iss 2, Pp 330-341 (2021)
institution DOAJ
collection DOAJ
language EN
topic Single-cell RNA sequencing
Consensus clustering
Latent space
Markov Chain Monte Carlo
Maximum likelihood approach
Biology (General)
QH301-705.5
spellingShingle Single-cell RNA sequencing
Consensus clustering
Latent space
Markov Chain Monte Carlo
Maximum likelihood approach
Biology (General)
QH301-705.5
Qianqian Song
Jing Su
Lance D. Miller
Wei Zhang
scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
description In gene expression profiling studies, including single-cell RNA sequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co-clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM.
format article
author Qianqian Song
Jing Su
Lance D. Miller
Wei Zhang
author_facet Qianqian Song
Jing Su
Lance D. Miller
Wei Zhang
author_sort Qianqian Song
title scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_short scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_full scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_fullStr scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_full_unstemmed scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_sort sclm: automatic detection of consensus gene clusters across multiple single-cell datasets
publisher Elsevier
publishDate 2021
url https://doaj.org/article/e9b4bee5e52b4358ac8765ba2fdd26c3
work_keys_str_mv AT qianqiansong sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets
AT jingsu sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets
AT lancedmiller sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets
AT weizhang sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets
_version_ 1718426759472349184