CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.

We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lillian R Thistlethwaite, Varduhi Petrosyan, Xiqi Li, Marcus J Miller, Sarah H Elsea, Aleksandar Milosavljevic
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/d8935c88ca1248f9bd4404aed5f9f659
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d8935c88ca1248f9bd4404aed5f9f659
record_format dspace
spelling oai:doaj.org-article:d8935c88ca1248f9bd4404aed5f9f6592021-12-02T19:57:26ZCTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.1553-734X1553-735810.1371/journal.pcbi.1008550https://doaj.org/article/d8935c88ca1248f9bd4404aed5f9f6592021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1008550https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD, "Connect the Dots", a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.Lillian R ThistlethwaiteVarduhi PetrosyanXiqi LiMarcus J MillerSarah H ElseaAleksandar MilosavljevicPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 1, p e1008550 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Lillian R Thistlethwaite
Varduhi Petrosyan
Xiqi Li
Marcus J Miller
Sarah H Elsea
Aleksandar Milosavljevic
CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
description We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD, "Connect the Dots", a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.
format article
author Lillian R Thistlethwaite
Varduhi Petrosyan
Xiqi Li
Marcus J Miller
Sarah H Elsea
Aleksandar Milosavljevic
author_facet Lillian R Thistlethwaite
Varduhi Petrosyan
Xiqi Li
Marcus J Miller
Sarah H Elsea
Aleksandar Milosavljevic
author_sort Lillian R Thistlethwaite
title CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_short CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_full CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_fullStr CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_full_unstemmed CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
title_sort ctd: an information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/d8935c88ca1248f9bd4404aed5f9f659
work_keys_str_mv AT lillianrthistlethwaite ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels
AT varduhipetrosyan ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels
AT xiqili ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels
AT marcusjmiller ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels
AT sarahhelsea ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels
AT aleksandarmilosavljevic ctdaninformationtheoreticalgorithmtointerpretsetsofmetabolomicandtranscriptomicperturbationsinthecontextofgraphicalmodels
_version_ 1718375871541149696