Synthetic single cell RNA sequencing data from small pilot studies using deep generative models

Abstract Deep generative models, such as variational autoencoders (VAEs) or deep Boltzmann machines (DBMs), can generate an arbitrary number of synthetic observations after being trained on an initial set of samples. This has mainly been investigated for imaging data but could also be useful for sin...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Martin Treppner, Adrián Salas-Bastos, Moritz Hess, Stefan Lenz, Tanja Vogel, Harald Binder
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/959146bec92e4abe9a6e3406415aeb4b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:959146bec92e4abe9a6e3406415aeb4b
record_format dspace
spelling oai:doaj.org-article:959146bec92e4abe9a6e3406415aeb4b2021-12-02T13:41:34ZSynthetic single cell RNA sequencing data from small pilot studies using deep generative models10.1038/s41598-021-88875-42045-2322https://doaj.org/article/959146bec92e4abe9a6e3406415aeb4b2021-04-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-88875-4https://doaj.org/toc/2045-2322Abstract Deep generative models, such as variational autoencoders (VAEs) or deep Boltzmann machines (DBMs), can generate an arbitrary number of synthetic observations after being trained on an initial set of samples. This has mainly been investigated for imaging data but could also be useful for single-cell transcriptomics (scRNA-seq). A small pilot study could be used for planning a full-scale experiment by investigating planned analysis strategies on synthetic data with different sample sizes. It is unclear whether synthetic observations generated based on a small scRNA-seq dataset reflect the properties relevant for subsequent data analysis steps. We specifically investigated two deep generative modeling approaches, VAEs and DBMs. First, we considered single-cell variational inference (scVI) in two variants, generating samples from the posterior distribution, the standard approach, or the prior distribution. Second, we propose single-cell deep Boltzmann machines (scDBMs). When considering the similarity of clustering results on synthetic data to ground-truth clustering, we find that the $$scVI_{posterior}$$ s c V I posterior variant resulted in high variability, most likely due to amplifying artifacts of small datasets. All approaches showed mixed results for cell types with different abundance by overrepresenting highly abundant cell types and missing less abundant cell types. With increasing pilot dataset sizes, the proportions of the cells in each cluster became more similar to that of ground-truth data. We also showed that all approaches learn the univariate distribution of most genes, but problems occurred with bimodality. Across all analyses, in comparing 10 $$\times$$ × Genomics and Smart-seq2 technologies, we could show that for 10 $$\times$$ × datasets, which have higher sparsity, it is more challenging to make inference from small to larger datasets. Overall, the results show that generative deep learning approaches might be valuable for supporting the design of scRNA-seq experiments.Martin TreppnerAdrián Salas-BastosMoritz HessStefan LenzTanja VogelHarald BinderNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-11 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Martin Treppner
Adrián Salas-Bastos
Moritz Hess
Stefan Lenz
Tanja Vogel
Harald Binder
Synthetic single cell RNA sequencing data from small pilot studies using deep generative models
description Abstract Deep generative models, such as variational autoencoders (VAEs) or deep Boltzmann machines (DBMs), can generate an arbitrary number of synthetic observations after being trained on an initial set of samples. This has mainly been investigated for imaging data but could also be useful for single-cell transcriptomics (scRNA-seq). A small pilot study could be used for planning a full-scale experiment by investigating planned analysis strategies on synthetic data with different sample sizes. It is unclear whether synthetic observations generated based on a small scRNA-seq dataset reflect the properties relevant for subsequent data analysis steps. We specifically investigated two deep generative modeling approaches, VAEs and DBMs. First, we considered single-cell variational inference (scVI) in two variants, generating samples from the posterior distribution, the standard approach, or the prior distribution. Second, we propose single-cell deep Boltzmann machines (scDBMs). When considering the similarity of clustering results on synthetic data to ground-truth clustering, we find that the $$scVI_{posterior}$$ s c V I posterior variant resulted in high variability, most likely due to amplifying artifacts of small datasets. All approaches showed mixed results for cell types with different abundance by overrepresenting highly abundant cell types and missing less abundant cell types. With increasing pilot dataset sizes, the proportions of the cells in each cluster became more similar to that of ground-truth data. We also showed that all approaches learn the univariate distribution of most genes, but problems occurred with bimodality. Across all analyses, in comparing 10 $$\times$$ × Genomics and Smart-seq2 technologies, we could show that for 10 $$\times$$ × datasets, which have higher sparsity, it is more challenging to make inference from small to larger datasets. Overall, the results show that generative deep learning approaches might be valuable for supporting the design of scRNA-seq experiments.
format article
author Martin Treppner
Adrián Salas-Bastos
Moritz Hess
Stefan Lenz
Tanja Vogel
Harald Binder
author_facet Martin Treppner
Adrián Salas-Bastos
Moritz Hess
Stefan Lenz
Tanja Vogel
Harald Binder
author_sort Martin Treppner
title Synthetic single cell RNA sequencing data from small pilot studies using deep generative models
title_short Synthetic single cell RNA sequencing data from small pilot studies using deep generative models
title_full Synthetic single cell RNA sequencing data from small pilot studies using deep generative models
title_fullStr Synthetic single cell RNA sequencing data from small pilot studies using deep generative models
title_full_unstemmed Synthetic single cell RNA sequencing data from small pilot studies using deep generative models
title_sort synthetic single cell rna sequencing data from small pilot studies using deep generative models
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/959146bec92e4abe9a6e3406415aeb4b
work_keys_str_mv AT martintreppner syntheticsinglecellrnasequencingdatafromsmallpilotstudiesusingdeepgenerativemodels
AT adriansalasbastos syntheticsinglecellrnasequencingdatafromsmallpilotstudiesusingdeepgenerativemodels
AT moritzhess syntheticsinglecellrnasequencingdatafromsmallpilotstudiesusingdeepgenerativemodels
AT stefanlenz syntheticsinglecellrnasequencingdatafromsmallpilotstudiesusingdeepgenerativemodels
AT tanjavogel syntheticsinglecellrnasequencingdatafromsmallpilotstudiesusingdeepgenerativemodels
AT haraldbinder syntheticsinglecellrnasequencingdatafromsmallpilotstudiesusingdeepgenerativemodels
_version_ 1718392569252020224