Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.

In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Adolfo Paolo Masucci, Alkiviadis Kalampokis, Victor Martínez Eguíluz, Emilio Hernández-García
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/984340c671c34581beac705f09f2d2d6
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:984340c671c34581beac705f09f2d2d6
record_format dspace
spelling oai:doaj.org-article:984340c671c34581beac705f09f2d2d62021-11-18T06:58:00ZWikipedia information flow analysis reveals the scale-free architecture of the semantic space.1932-620310.1371/journal.pone.0017333https://doaj.org/article/984340c671c34581beac705f09f2d2d62011-02-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21407801/?tool=EBIhttps://doaj.org/toc/1932-6203In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.Adolfo Paolo MasucciAlkiviadis KalampokisVictor Martínez EguíluzEmilio Hernández-GarcíaPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 2, p e17333 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Adolfo Paolo Masucci
Alkiviadis Kalampokis
Victor Martínez Eguíluz
Emilio Hernández-García
Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.
description In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.
format article
author Adolfo Paolo Masucci
Alkiviadis Kalampokis
Victor Martínez Eguíluz
Emilio Hernández-García
author_facet Adolfo Paolo Masucci
Alkiviadis Kalampokis
Victor Martínez Eguíluz
Emilio Hernández-García
author_sort Adolfo Paolo Masucci
title Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.
title_short Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.
title_full Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.
title_fullStr Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.
title_full_unstemmed Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.
title_sort wikipedia information flow analysis reveals the scale-free architecture of the semantic space.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/984340c671c34581beac705f09f2d2d6
work_keys_str_mv AT adolfopaolomasucci wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace
AT alkiviadiskalampokis wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace
AT victormartinezeguiluz wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace
AT emiliohernandezgarcia wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace
_version_ 1718424120389009408