Measuring novelty in science with word embedding.

Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of sema...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sotaro Shibayama, Deyun Yin, Kuniko Matsumoto
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/ea3c2b53cfd14a0190842d043d8f2ef2
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ea3c2b53cfd14a0190842d043d8f2ef2
record_format dspace
spelling oai:doaj.org-article:ea3c2b53cfd14a0190842d043d8f2ef22021-12-02T20:09:40ZMeasuring novelty in science with word embedding.1932-620310.1371/journal.pone.0254034https://doaj.org/article/ea3c2b53cfd14a0190842d043d8f2ef22021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0254034https://doaj.org/toc/1932-6203Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding-a vector representation of each vocabulary-to each cited reference on the basis of text information included in the reference. With these vectors, a distance between every pair of references is computed. Finally, the novelty of a focal document is evaluated by summarizing the distances between all references. The approach draws on limited text information (the titles of references) and publicly shared library for word embeddings, which minimizes the requirement of data access and computational cost. We share the code, with which one can compute the novelty score of a document of interest only by having the focal document's reference list. We validate the proposed measure through three exercises. First, we confirm that word embeddings can be used to quantify semantic distances between documents by comparing with an established bibliometric distance measure. Second, we confirm the criterion-related validity of the proposed novelty measure with self-reported novelty scores collected from a questionnaire survey. Finally, as novelty is known to be correlated with future citation impact, we confirm that the proposed measure can predict future citation.Sotaro ShibayamaDeyun YinKuniko MatsumotoPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 7, p e0254034 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Sotaro Shibayama
Deyun Yin
Kuniko Matsumoto
Measuring novelty in science with word embedding.
description Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding-a vector representation of each vocabulary-to each cited reference on the basis of text information included in the reference. With these vectors, a distance between every pair of references is computed. Finally, the novelty of a focal document is evaluated by summarizing the distances between all references. The approach draws on limited text information (the titles of references) and publicly shared library for word embeddings, which minimizes the requirement of data access and computational cost. We share the code, with which one can compute the novelty score of a document of interest only by having the focal document's reference list. We validate the proposed measure through three exercises. First, we confirm that word embeddings can be used to quantify semantic distances between documents by comparing with an established bibliometric distance measure. Second, we confirm the criterion-related validity of the proposed novelty measure with self-reported novelty scores collected from a questionnaire survey. Finally, as novelty is known to be correlated with future citation impact, we confirm that the proposed measure can predict future citation.
format article
author Sotaro Shibayama
Deyun Yin
Kuniko Matsumoto
author_facet Sotaro Shibayama
Deyun Yin
Kuniko Matsumoto
author_sort Sotaro Shibayama
title Measuring novelty in science with word embedding.
title_short Measuring novelty in science with word embedding.
title_full Measuring novelty in science with word embedding.
title_fullStr Measuring novelty in science with word embedding.
title_full_unstemmed Measuring novelty in science with word embedding.
title_sort measuring novelty in science with word embedding.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/ea3c2b53cfd14a0190842d043d8f2ef2
work_keys_str_mv AT sotaroshibayama measuringnoveltyinsciencewithwordembedding
AT deyunyin measuringnoveltyinsciencewithwordembedding
AT kunikomatsumoto measuringnoveltyinsciencewithwordembedding
_version_ 1718375096571133952