Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.

Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tomasz Konopka, Sandra Ng, Damian Smedley
Format:	article
Language:	EN
Published:	Public Library of Science (PLoS) 2021
Subjects:	Biology (General) QH301-705.5
Online Access:	https://doaj.org/article/bb4fd93ff8bb47818fb09c818b31bc10
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:doaj.org-article:bb4fd93ff8bb47818fb09c818b31bc10
record_format	dspace
spelling	oai:doaj.org-article:bb4fd93ff8bb47818fb09c818b31bc102021-12-02T19:58:06ZDiffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.1553-734X1553-735810.1371/journal.pcbi.1009283https://doaj.org/article/bb4fd93ff8bb47818fb09c818b31bc102021-08-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009283https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespoke analysis pipelines from scratch is time-consuming, and general tools for exploring such heterogeneous data are not available. We argue that by treating all data as text, a knowledge-base can accommodate a range of bioinformatic data types and applications. We show that a database coupled to nearest-neighbor algorithms can address common tasks such as gene-set analysis as well as specific tasks such as ontology translation. We further show that a mathematical transformation motivated by diffusion can be effective for exploration across heterogeneous datasets. Diffusion enables the knowledge-base to begin with a sparse query, impute more features, and find matches that would otherwise remain hidden. This can be used, for example, to map multi-modal queries consisting of gene symbols and phenotypes to descriptions of diseases. Diffusion also enables user-driven learning: when the knowledge-base cannot provide satisfactory search results in the first instance, users can improve the results in real-time by adding domain-specific knowledge. User-driven learning has implications for data management, integration, and curation.Tomasz KonopkaSandra NgDamian SmedleyPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 8, p e1009283 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Biology (General) QH301-705.5
spellingShingle	Biology (General) QH301-705.5 Tomasz Konopka Sandra Ng Damian Smedley Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.
description	Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespoke analysis pipelines from scratch is time-consuming, and general tools for exploring such heterogeneous data are not available. We argue that by treating all data as text, a knowledge-base can accommodate a range of bioinformatic data types and applications. We show that a database coupled to nearest-neighbor algorithms can address common tasks such as gene-set analysis as well as specific tasks such as ontology translation. We further show that a mathematical transformation motivated by diffusion can be effective for exploration across heterogeneous datasets. Diffusion enables the knowledge-base to begin with a sparse query, impute more features, and find matches that would otherwise remain hidden. This can be used, for example, to map multi-modal queries consisting of gene symbols and phenotypes to descriptions of diseases. Diffusion also enables user-driven learning: when the knowledge-base cannot provide satisfactory search results in the first instance, users can improve the results in real-time by adding domain-specific knowledge. User-driven learning has implications for data management, integration, and curation.
format	article
author	Tomasz Konopka Sandra Ng Damian Smedley
author_facet	Tomasz Konopka Sandra Ng Damian Smedley
author_sort	Tomasz Konopka
title	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.
title_short	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.
title_full	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.
title_fullStr	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.
title_full_unstemmed	Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.
title_sort	diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.
publisher	Public Library of Science (PLoS)
publishDate	2021
url	https://doaj.org/article/bb4fd93ff8bb47818fb09c818b31bc10
work_keys_str_mv	AT tomaszkonopka diffusionenablesintegrationofheterogeneousdataanduserdrivenlearninginadesktopknowledgebase AT sandrang diffusionenablesintegrationofheterogeneousdataanduserdrivenlearninginadesktopknowledgebase AT damiansmedley diffusionenablesintegrationofheterogeneousdataanduserdrivenlearninginadesktopknowledgebase
_version_	1718375799474618368

Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base.

Similar Items