GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning

Abstract A variety of detailed data about geological topics and geoscience knowledge are buried in the geoscience literature and rarely used. Named entity recognition (NER) provides both opportunities and challenges to leverage this wealth of data in the geoscience literature for data analysis and f...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Qinjun Qiu, Zhong Xie, Liang Wu, Liufeng Tao
Formato: article
Lenguaje:EN
Publicado: American Geophysical Union (AGU) 2019
Materias:
Acceso en línea:https://doaj.org/article/8b19d8b977604258a40f352ff20ac254
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:8b19d8b977604258a40f352ff20ac254
record_format dspace
spelling oai:doaj.org-article:8b19d8b977604258a40f352ff20ac2542021-11-30T22:55:32ZGNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning2333-508410.1029/2019EA000610https://doaj.org/article/8b19d8b977604258a40f352ff20ac2542019-06-01T00:00:00Zhttps://doi.org/10.1029/2019EA000610https://doaj.org/toc/2333-5084Abstract A variety of detailed data about geological topics and geoscience knowledge are buried in the geoscience literature and rarely used. Named entity recognition (NER) provides both opportunities and challenges to leverage this wealth of data in the geoscience literature for data analysis and further information extraction. Existing NER models and techniques are mainly based on rule‐based and supervised approaches, and developing such systems requires a costly manual effort. In this paper, we first design a generic stepwise framework for domain‐specific NER. Following this framework, domain‐specific entities and domain‐general words are collected and selected as seed terms. Normalization and grouping processes are then applied to these seed terms for further analysis. A random extraction algorithm based on a unigram language model is used to generate a large‐scale training data set consisting of probabilistically labeled pseudosentences. Each generated sentence is then used as input to the self‐training and learning algorithm. Experimental results on two constructed data sets demonstrate that the proposed model effectively recognizes and identifies geological named entities.Qinjun QiuZhong XieLiang WuLiufeng TaoAmerican Geophysical Union (AGU)articlenatural language processingnamed entity recognitiongeoscience domainunsupervised learningAstronomyQB1-991GeologyQE1-996.5ENEarth and Space Science, Vol 6, Iss 6, Pp 931-946 (2019)
institution DOAJ
collection DOAJ
language EN
topic natural language processing
named entity recognition
geoscience domain
unsupervised learning
Astronomy
QB1-991
Geology
QE1-996.5
spellingShingle natural language processing
named entity recognition
geoscience domain
unsupervised learning
Astronomy
QB1-991
Geology
QE1-996.5
Qinjun Qiu
Zhong Xie
Liang Wu
Liufeng Tao
GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning
description Abstract A variety of detailed data about geological topics and geoscience knowledge are buried in the geoscience literature and rarely used. Named entity recognition (NER) provides both opportunities and challenges to leverage this wealth of data in the geoscience literature for data analysis and further information extraction. Existing NER models and techniques are mainly based on rule‐based and supervised approaches, and developing such systems requires a costly manual effort. In this paper, we first design a generic stepwise framework for domain‐specific NER. Following this framework, domain‐specific entities and domain‐general words are collected and selected as seed terms. Normalization and grouping processes are then applied to these seed terms for further analysis. A random extraction algorithm based on a unigram language model is used to generate a large‐scale training data set consisting of probabilistically labeled pseudosentences. Each generated sentence is then used as input to the self‐training and learning algorithm. Experimental results on two constructed data sets demonstrate that the proposed model effectively recognizes and identifies geological named entities.
format article
author Qinjun Qiu
Zhong Xie
Liang Wu
Liufeng Tao
author_facet Qinjun Qiu
Zhong Xie
Liang Wu
Liufeng Tao
author_sort Qinjun Qiu
title GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning
title_short GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning
title_full GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning
title_fullStr GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning
title_full_unstemmed GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning
title_sort gner: a generative model for geological named entity recognition without labeled data using deep learning
publisher American Geophysical Union (AGU)
publishDate 2019
url https://doaj.org/article/8b19d8b977604258a40f352ff20ac254
work_keys_str_mv AT qinjunqiu gneragenerativemodelforgeologicalnamedentityrecognitionwithoutlabeleddatausingdeeplearning
AT zhongxie gneragenerativemodelforgeologicalnamedentityrecognitionwithoutlabeleddatausingdeeplearning
AT liangwu gneragenerativemodelforgeologicalnamedentityrecognitionwithoutlabeleddatausingdeeplearning
AT liufengtao gneragenerativemodelforgeologicalnamedentityrecognitionwithoutlabeleddatausingdeeplearning
_version_ 1718406214418694144