FONDUE: A Framework for Node Disambiguation and Deduplication Using Network Embeddings

Data often have a relational nature that is most easily expressed in a network form, with its main components consisting of nodes that represent real objects and links that signify the relations between these objects. Modeling networks is useful for many purposes, but the efficacy of downstream task...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ahmad Mel, Bo Kang, Jefrey Lijffijt, Tijl De Bie
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/e1c90d2f4559483083ef1e1c4fca1140
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e1c90d2f4559483083ef1e1c4fca1140
record_format dspace
spelling oai:doaj.org-article:e1c90d2f4559483083ef1e1c4fca11402021-11-11T14:59:47ZFONDUE: A Framework for Node Disambiguation and Deduplication Using Network Embeddings10.3390/app112198842076-3417https://doaj.org/article/e1c90d2f4559483083ef1e1c4fca11402021-10-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/9884https://doaj.org/toc/2076-3417Data often have a relational nature that is most easily expressed in a network form, with its main components consisting of nodes that represent real objects and links that signify the relations between these objects. Modeling networks is useful for many purposes, but the efficacy of downstream tasks is often hampered by data quality issues related to their construction. In many constructed networks, ambiguity may arise when a node corresponds to multiple concepts. Similarly, a single entity can be mistakenly represented by several different nodes. In this paper, we formalize both the node disambiguation (NDA) and node deduplication (NDD) tasks to resolve these data quality issues. We then introduce FONDUE, a framework for utilizing network embedding methods for data-driven disambiguation and deduplication of nodes. Given an undirected and unweighted network, FONDUE-NDA identifies nodes that appear to correspond to multiple entities for subsequent splitting and suggests how to split them (node disambiguation), whereas FONDUE-NDD identifies nodes that appear to correspond to same entity for merging (node deduplication), using only the network topology. From controlled experiments on benchmark networks, we find that FONDUE-NDA is substantially and consistently more accurate with lower computational cost in identifying ambiguous nodes, and that FONDUE-NDD is a competitive alternative for node deduplication, when compared to state-of-the-art alternatives.Ahmad MelBo KangJefrey LijffijtTijl De BieMDPI AGarticlenode disambiguationnode deduplicationnode linkingentity linkingnetwork embeddingsrepresentation learningTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 9884, p 9884 (2021)
institution DOAJ
collection DOAJ
language EN
topic node disambiguation
node deduplication
node linking
entity linking
network embeddings
representation learning
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle node disambiguation
node deduplication
node linking
entity linking
network embeddings
representation learning
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Ahmad Mel
Bo Kang
Jefrey Lijffijt
Tijl De Bie
FONDUE: A Framework for Node Disambiguation and Deduplication Using Network Embeddings
description Data often have a relational nature that is most easily expressed in a network form, with its main components consisting of nodes that represent real objects and links that signify the relations between these objects. Modeling networks is useful for many purposes, but the efficacy of downstream tasks is often hampered by data quality issues related to their construction. In many constructed networks, ambiguity may arise when a node corresponds to multiple concepts. Similarly, a single entity can be mistakenly represented by several different nodes. In this paper, we formalize both the node disambiguation (NDA) and node deduplication (NDD) tasks to resolve these data quality issues. We then introduce FONDUE, a framework for utilizing network embedding methods for data-driven disambiguation and deduplication of nodes. Given an undirected and unweighted network, FONDUE-NDA identifies nodes that appear to correspond to multiple entities for subsequent splitting and suggests how to split them (node disambiguation), whereas FONDUE-NDD identifies nodes that appear to correspond to same entity for merging (node deduplication), using only the network topology. From controlled experiments on benchmark networks, we find that FONDUE-NDA is substantially and consistently more accurate with lower computational cost in identifying ambiguous nodes, and that FONDUE-NDD is a competitive alternative for node deduplication, when compared to state-of-the-art alternatives.
format article
author Ahmad Mel
Bo Kang
Jefrey Lijffijt
Tijl De Bie
author_facet Ahmad Mel
Bo Kang
Jefrey Lijffijt
Tijl De Bie
author_sort Ahmad Mel
title FONDUE: A Framework for Node Disambiguation and Deduplication Using Network Embeddings
title_short FONDUE: A Framework for Node Disambiguation and Deduplication Using Network Embeddings
title_full FONDUE: A Framework for Node Disambiguation and Deduplication Using Network Embeddings
title_fullStr FONDUE: A Framework for Node Disambiguation and Deduplication Using Network Embeddings
title_full_unstemmed FONDUE: A Framework for Node Disambiguation and Deduplication Using Network Embeddings
title_sort fondue: a framework for node disambiguation and deduplication using network embeddings
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/e1c90d2f4559483083ef1e1c4fca1140
work_keys_str_mv AT ahmadmel fondueaframeworkfornodedisambiguationanddeduplicationusingnetworkembeddings
AT bokang fondueaframeworkfornodedisambiguationanddeduplicationusingnetworkembeddings
AT jefreylijffijt fondueaframeworkfornodedisambiguationanddeduplicationusingnetworkembeddings
AT tijldebie fondueaframeworkfornodedisambiguationanddeduplicationusingnetworkembeddings
_version_ 1718437936708452352