Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability

Digital humanities scholars strongly need a corpus exploration method that provides topics easier to interpret than standard LDA topic models. To move towards this goal, here we propose a combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies in an ontology a ser...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Anne Lauscher, Pablo Ruiz Fabo, Federico Nanni, Simone Paolo Ponzetto
Formato:	article
Lenguaje:	EN
Publicado:	Accademia University Press 2016
Materias:	Social Sciences H Computational linguistics. Natural language processing P98-98.5
Acceso en línea:	https://doaj.org/article/462f09ce7b074635867fb54bdb8646c0
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:462f09ce7b074635867fb54bdb8646c0
record_format	dspace
spelling	oai:doaj.org-article:462f09ce7b074635867fb54bdb8646c02021-12-02T09:52:27ZEntities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability2499-455310.4000/ijcol.392https://doaj.org/article/462f09ce7b074635867fb54bdb8646c02016-12-01T00:00:00Zhttp://journals.openedition.org/ijcol/392https://doaj.org/toc/2499-4553Digital humanities scholars strongly need a corpus exploration method that provides topics easier to interpret than standard LDA topic models. To move towards this goal, here we propose a combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies in an ontology a series of descriptive labels for each document in a corpus. Then it generates a specific topic for each label. Having a direct relation between topics and labels makes interpretation easier; using an ontology as background knowledge limits label ambiguity. As our topics are described with a limited number of clear-cut labels, they promote interpretability and support the quantitative evaluation of the obtained results. We illustrate the potential of the approach by applying it to three datasets, namely the transcription of speeches from the European Parliament fifth mandate, the Enron Corpus and the Hillary Clinton Email Dataset. While some of these resources have already been adopted by the natural language processing community, they still hold a large potential for humanities scholars, part of which could be exploited in studies that will adopt the fine-grained exploration method presented in this paper.Anne LauscherPablo Ruiz FaboFederico NanniSimone Paolo PonzettoAccademia University PressarticleSocial SciencesHComputational linguistics. Natural language processingP98-98.5ENIJCoL, Vol 2, Iss 2, Pp 67-87 (2016)
institution	DOAJ
collection	DOAJ
language	EN
topic	Social Sciences H Computational linguistics. Natural language processing P98-98.5
spellingShingle	Social Sciences H Computational linguistics. Natural language processing P98-98.5 Anne Lauscher Pablo Ruiz Fabo Federico Nanni Simone Paolo Ponzetto Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
description	Digital humanities scholars strongly need a corpus exploration method that provides topics easier to interpret than standard LDA topic models. To move towards this goal, here we propose a combination of two techniques, called Entity Linking and Labeled LDA. Our method identifies in an ontology a series of descriptive labels for each document in a corpus. Then it generates a specific topic for each label. Having a direct relation between topics and labels makes interpretation easier; using an ontology as background knowledge limits label ambiguity. As our topics are described with a limited number of clear-cut labels, they promote interpretability and support the quantitative evaluation of the obtained results. We illustrate the potential of the approach by applying it to three datasets, namely the transcription of speeches from the European Parliament fifth mandate, the Enron Corpus and the Hillary Clinton Email Dataset. While some of these resources have already been adopted by the natural language processing community, they still hold a large potential for humanities scholars, part of which could be exploited in studies that will adopt the fine-grained exploration method presented in this paper.
format	article
author	Anne Lauscher Pablo Ruiz Fabo Federico Nanni Simone Paolo Ponzetto
author_facet	Anne Lauscher Pablo Ruiz Fabo Federico Nanni Simone Paolo Ponzetto
author_sort	Anne Lauscher
title	Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_short	Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_full	Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_fullStr	Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_full_unstemmed	Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability
title_sort	entities as topic labels: combining entity linking and labeled lda to improve topic interpretability and evaluability
publisher	Accademia University Press
publishDate	2016
url	https://doaj.org/article/462f09ce7b074635867fb54bdb8646c0
work_keys_str_mv	AT annelauscher entitiesastopiclabelscombiningentitylinkingandlabeledldatoimprovetopicinterpretabilityandevaluability AT pabloruizfabo entitiesastopiclabelscombiningentitylinkingandlabeledldatoimprovetopicinterpretabilityandevaluability AT federiconanni entitiesastopiclabelscombiningentitylinkingandlabeledldatoimprovetopicinterpretabilityandevaluability AT simonepaoloponzetto entitiesastopiclabelscombiningentitylinkingandlabeledldatoimprovetopicinterpretabilityandevaluability
_version_	1718397940998864896

Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability

Ejemplares similares