COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization

Abstract The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Andre Esteva, Anuprit Kale, Romain Paulus, Kazuma Hashimoto, Wenpeng Yin, Dragomir Radev, Richard Socher
Formato:	article
Lenguaje:	EN
Publicado:	Nature Portfolio 2021
Materias:	Computer applications to medicine. Medical informatics R858-859.7
Acceso en línea:	https://doaj.org/article/b821725462d6408ea729067c04e6893f
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:b821725462d6408ea729067c04e6893f
record_format	dspace
spelling	oai:doaj.org-article:b821725462d6408ea729067c04e6893f2021-12-02T14:27:46ZCOVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization10.1038/s41746-021-00437-02398-6352https://doaj.org/article/b821725462d6408ea729067c04e6893f2021-04-01T00:00:00Zhttps://doi.org/10.1038/s41746-021-00437-0https://doaj.org/toc/2398-6352Abstract The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset. Here, we present CO-Search, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis. CO-Search is built from two sequential parts: a hybrid semantic-keyword retriever, which takes an input query and returns a sorted list of the 1000 most relevant documents, and a re-ranker, which further orders them by relevance. The retriever is composed of a deep learning model (Siamese-BERT) that encodes query-level meaning, along with two keyword-based models (BM25, TF-IDF) that emphasize the most important words of a query. The re-ranker assigns a relevance score to each document, computed from the outputs of (1) a question–answering module which gauges how much each document answers the query, and (2) an abstractive summarization module which determines how well a query matches a generated summary of the document. To account for the relatively limited dataset, we develop a text augmentation technique which splits the documents into pairs of paragraphs and the citations contained in them, creating millions of (citation title, paragraph) tuples for training the retriever. We evaluate our system ( http://einstein.ai/covid ) on the data of the TREC-COVID information retrieval challenge, obtaining strong performance across multiple key information retrieval metrics.Andre EstevaAnuprit KaleRomain PaulusKazuma HashimotoWenpeng YinDragomir RadevRichard SocherNature PortfolioarticleComputer applications to medicine. Medical informaticsR858-859.7ENnpj Digital Medicine, Vol 4, Iss 1, Pp 1-9 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Computer applications to medicine. Medical informatics R858-859.7
spellingShingle	Computer applications to medicine. Medical informatics R858-859.7 Andre Esteva Anuprit Kale Romain Paulus Kazuma Hashimoto Wenpeng Yin Dragomir Radev Richard Socher COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
description	Abstract The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. Throughout 2020, over 400,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset. Here, we present CO-Search, a semantic, multi-stage, search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers and avoiding misinformation during a time of crisis. CO-Search is built from two sequential parts: a hybrid semantic-keyword retriever, which takes an input query and returns a sorted list of the 1000 most relevant documents, and a re-ranker, which further orders them by relevance. The retriever is composed of a deep learning model (Siamese-BERT) that encodes query-level meaning, along with two keyword-based models (BM25, TF-IDF) that emphasize the most important words of a query. The re-ranker assigns a relevance score to each document, computed from the outputs of (1) a question–answering module which gauges how much each document answers the query, and (2) an abstractive summarization module which determines how well a query matches a generated summary of the document. To account for the relatively limited dataset, we develop a text augmentation technique which splits the documents into pairs of paragraphs and the citations contained in them, creating millions of (citation title, paragraph) tuples for training the retriever. We evaluate our system ( http://einstein.ai/covid ) on the data of the TREC-COVID information retrieval challenge, obtaining strong performance across multiple key information retrieval metrics.
format	article
author	Andre Esteva Anuprit Kale Romain Paulus Kazuma Hashimoto Wenpeng Yin Dragomir Radev Richard Socher
author_facet	Andre Esteva Anuprit Kale Romain Paulus Kazuma Hashimoto Wenpeng Yin Dragomir Radev Richard Socher
author_sort	Andre Esteva
title	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_short	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_full	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_fullStr	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_full_unstemmed	COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
title_sort	covid-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization
publisher	Nature Portfolio
publishDate	2021
url	https://doaj.org/article/b821725462d6408ea729067c04e6893f
work_keys_str_mv	AT andreesteva covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT anupritkale covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT romainpaulus covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT kazumahashimoto covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT wenpengyin covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT dragomirradev covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization AT richardsocher covid19informationretrievalwithdeeplearningbasedsemanticsearchquestionansweringandabstractivesummarization
_version_	1718391275394170880

COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization

Ejemplares similares