SiGMoiD: A super-statistical generative model for binary data.

In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers' identification of constraints and are computationally expe...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Xiaochuan Zhao, Germán Plata, Purushottam D Dixit
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/d7f5ee881f6c4f3ba30cfda4a58586f1
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d7f5ee881f6c4f3ba30cfda4a58586f1
record_format dspace
spelling oai:doaj.org-article:d7f5ee881f6c4f3ba30cfda4a58586f12021-12-02T19:58:07ZSiGMoiD: A super-statistical generative model for binary data.1553-734X1553-735810.1371/journal.pcbi.1009275https://doaj.org/article/d7f5ee881f6c4f3ba30cfda4a58586f12021-08-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009275https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers' identification of constraints and are computationally expensive to infer when the number of variables is large (N~100). Here, we address both these issues with Super-statistical Generative Model for binary Data (SiGMoiD). SiGMoiD is a maximum entropy-based framework where we imagine the data as arising from super-statistical system; individual binary variables in a given sample are coupled to the same 'bath' whose intensive variables vary from sample to sample. Importantly, unlike standard maximum entropy approaches where modeler specifies the constraints, the SiGMoiD algorithm infers them directly from the data. Due to this optimal choice of constraints, SiGMoiD allows us to model collections of a very large number (N>1000) of binary variables. Finally, SiGMoiD offers a reduced dimensional description of the data, allowing us to identify clusters of similar data points as well as binary variables. We illustrate the versatility of SiGMoiD using multiple datasets spanning several time- and length-scales.Xiaochuan ZhaoGermán PlataPurushottam D DixitPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 8, p e1009275 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Xiaochuan Zhao
Germán Plata
Purushottam D Dixit
SiGMoiD: A super-statistical generative model for binary data.
description In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers' identification of constraints and are computationally expensive to infer when the number of variables is large (N~100). Here, we address both these issues with Super-statistical Generative Model for binary Data (SiGMoiD). SiGMoiD is a maximum entropy-based framework where we imagine the data as arising from super-statistical system; individual binary variables in a given sample are coupled to the same 'bath' whose intensive variables vary from sample to sample. Importantly, unlike standard maximum entropy approaches where modeler specifies the constraints, the SiGMoiD algorithm infers them directly from the data. Due to this optimal choice of constraints, SiGMoiD allows us to model collections of a very large number (N>1000) of binary variables. Finally, SiGMoiD offers a reduced dimensional description of the data, allowing us to identify clusters of similar data points as well as binary variables. We illustrate the versatility of SiGMoiD using multiple datasets spanning several time- and length-scales.
format article
author Xiaochuan Zhao
Germán Plata
Purushottam D Dixit
author_facet Xiaochuan Zhao
Germán Plata
Purushottam D Dixit
author_sort Xiaochuan Zhao
title SiGMoiD: A super-statistical generative model for binary data.
title_short SiGMoiD: A super-statistical generative model for binary data.
title_full SiGMoiD: A super-statistical generative model for binary data.
title_fullStr SiGMoiD: A super-statistical generative model for binary data.
title_full_unstemmed SiGMoiD: A super-statistical generative model for binary data.
title_sort sigmoid: a super-statistical generative model for binary data.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/d7f5ee881f6c4f3ba30cfda4a58586f1
work_keys_str_mv AT xiaochuanzhao sigmoidasuperstatisticalgenerativemodelforbinarydata
AT germanplata sigmoidasuperstatisticalgenerativemodelforbinarydata
AT purushottamddixit sigmoidasuperstatisticalgenerativemodelforbinarydata
_version_ 1718375808745078784