BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis

While most of the existing topic models perform a <i>full analysis</i> on a set of documents to discover all topics, it is noticed recently that in many situations users are interested in fine-grained topics related to some specific aspects only. As a result, <i>targeted analysis&l...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jiamiao Wang, Ling Chen, Lei Li, Xindong Wu
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
AI
T
Acceso en línea:https://doaj.org/article/ae353802e15141c89424cc248405e4ae
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ae353802e15141c89424cc248405e4ae
record_format dspace
spelling oai:doaj.org-article:ae353802e15141c89424cc248405e4ae2021-11-11T15:13:29ZBiTTM: A Core Biterms-Based Topic Model for Targeted Analysis10.3390/app1121101622076-3417https://doaj.org/article/ae353802e15141c89424cc248405e4ae2021-10-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/10162https://doaj.org/toc/2076-3417While most of the existing topic models perform a <i>full analysis</i> on a set of documents to discover all topics, it is noticed recently that in many situations users are interested in fine-grained topics related to some specific aspects only. As a result, <i>targeted analysis</i> (or <i>focused analysis</i>) has been proposed to address this problem. Given a corpus of documents from a broad area, targeted analysis discovers only topics related with user-interested aspects that are expressed by a set of user-provided query keywords. Existing approaches for targeted analysis suffer from problems such as topic loss and topic suppression because of their inherent assumptions and strategies. Moreover, existing approaches are not designed to address computation efficiency, while targeted analysis is supposed to provide responses to user queries as soon as possible. In this paper, we propose a <i>core<b> BiT</b>erms-based <b>T</b>opic<b> M</b>odel</i> (BiTTM). By modelling topics from <i>core biterms</i> that are potentially relevant to the target query, on one hand, BiTTM captures the context information across documents to alleviate the problem of topic loss or suppression; on the other hand, our proposed model enables the efficient modelling of topics related to specific aspects. Our experiments on nine real-world datasets demonstrate BiTTM outperforms existing approaches in terms of both effectiveness and efficiency.Jiamiao WangLing ChenLei LiXindong WuMDPI AGarticleAItext analysistopic modelbitermcontent analysistargeted modelingTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 10162, p 10162 (2021)
institution DOAJ
collection DOAJ
language EN
topic AI
text analysis
topic model
biterm
content analysis
targeted modeling
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle AI
text analysis
topic model
biterm
content analysis
targeted modeling
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Jiamiao Wang
Ling Chen
Lei Li
Xindong Wu
BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis
description While most of the existing topic models perform a <i>full analysis</i> on a set of documents to discover all topics, it is noticed recently that in many situations users are interested in fine-grained topics related to some specific aspects only. As a result, <i>targeted analysis</i> (or <i>focused analysis</i>) has been proposed to address this problem. Given a corpus of documents from a broad area, targeted analysis discovers only topics related with user-interested aspects that are expressed by a set of user-provided query keywords. Existing approaches for targeted analysis suffer from problems such as topic loss and topic suppression because of their inherent assumptions and strategies. Moreover, existing approaches are not designed to address computation efficiency, while targeted analysis is supposed to provide responses to user queries as soon as possible. In this paper, we propose a <i>core<b> BiT</b>erms-based <b>T</b>opic<b> M</b>odel</i> (BiTTM). By modelling topics from <i>core biterms</i> that are potentially relevant to the target query, on one hand, BiTTM captures the context information across documents to alleviate the problem of topic loss or suppression; on the other hand, our proposed model enables the efficient modelling of topics related to specific aspects. Our experiments on nine real-world datasets demonstrate BiTTM outperforms existing approaches in terms of both effectiveness and efficiency.
format article
author Jiamiao Wang
Ling Chen
Lei Li
Xindong Wu
author_facet Jiamiao Wang
Ling Chen
Lei Li
Xindong Wu
author_sort Jiamiao Wang
title BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis
title_short BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis
title_full BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis
title_fullStr BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis
title_full_unstemmed BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis
title_sort bittm: a core biterms-based topic model for targeted analysis
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/ae353802e15141c89424cc248405e4ae
work_keys_str_mv AT jiamiaowang bittmacorebitermsbasedtopicmodelfortargetedanalysis
AT lingchen bittmacorebitermsbasedtopicmodelfortargetedanalysis
AT leili bittmacorebitermsbasedtopicmodelfortargetedanalysis
AT xindongwu bittmacorebitermsbasedtopicmodelfortargetedanalysis
_version_ 1718436404962263040