Linguistic measures of chemical diversity and the “keywords” of molecular collections

Abstract Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections (“corpora”), including those deposited on the Internet – indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Michał Woźniak, Agnieszka Wołos, Urszula Modrzyk, Rafał L. Górski, Jan Winkowski, Michał Bajczyk, Sara Szymkuć, Bartosz A. Grzybowski, Maciej Eder
Formato:	article
Lenguaje:	EN
Publicado:	Nature Portfolio 2018
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/cb3bb3ade27f4b13a2b20861629d946e
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:cb3bb3ade27f4b13a2b20861629d946e
record_format	dspace
spelling	oai:doaj.org-article:cb3bb3ade27f4b13a2b20861629d946e2021-12-02T11:40:17ZLinguistic measures of chemical diversity and the “keywords” of molecular collections10.1038/s41598-018-25440-62045-2322https://doaj.org/article/cb3bb3ade27f4b13a2b20861629d946e2018-05-01T00:00:00Zhttps://doi.org/10.1038/s41598-018-25440-6https://doaj.org/toc/2045-2322Abstract Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections (“corpora”), including those deposited on the Internet – indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting most appropriate keywords from documents. This paper describes how such corpus-linguistic concepts can be extended to chemistry based on characteristic “chemical words” that span more than traditional functional groups and, instead, look at common structural fragments molecules share. Using these words, it is possible to quantify the diversity of chemical collections/databases in new ways and to define molecular “keywords” by which such collections are best characterized and annotated.Michał WoźniakAgnieszka WołosUrszula ModrzykRafał L. GórskiJan WinkowskiMichał BajczykSara SzymkućBartosz A. GrzybowskiMaciej EderNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 8, Iss 1, Pp 1-10 (2018)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Michał Woźniak Agnieszka Wołos Urszula Modrzyk Rafał L. Górski Jan Winkowski Michał Bajczyk Sara Szymkuć Bartosz A. Grzybowski Maciej Eder Linguistic measures of chemical diversity and the “keywords” of molecular collections
description	Abstract Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections (“corpora”), including those deposited on the Internet – indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting most appropriate keywords from documents. This paper describes how such corpus-linguistic concepts can be extended to chemistry based on characteristic “chemical words” that span more than traditional functional groups and, instead, look at common structural fragments molecules share. Using these words, it is possible to quantify the diversity of chemical collections/databases in new ways and to define molecular “keywords” by which such collections are best characterized and annotated.
format	article
author	Michał Woźniak Agnieszka Wołos Urszula Modrzyk Rafał L. Górski Jan Winkowski Michał Bajczyk Sara Szymkuć Bartosz A. Grzybowski Maciej Eder
author_facet	Michał Woźniak Agnieszka Wołos Urszula Modrzyk Rafał L. Górski Jan Winkowski Michał Bajczyk Sara Szymkuć Bartosz A. Grzybowski Maciej Eder
author_sort	Michał Woźniak
title	Linguistic measures of chemical diversity and the “keywords” of molecular collections
title_short	Linguistic measures of chemical diversity and the “keywords” of molecular collections
title_full	Linguistic measures of chemical diversity and the “keywords” of molecular collections
title_fullStr	Linguistic measures of chemical diversity and the “keywords” of molecular collections
title_full_unstemmed	Linguistic measures of chemical diversity and the “keywords” of molecular collections
title_sort	linguistic measures of chemical diversity and the “keywords” of molecular collections
publisher	Nature Portfolio
publishDate	2018
url	https://doaj.org/article/cb3bb3ade27f4b13a2b20861629d946e
work_keys_str_mv	AT michałwozniak linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT agnieszkawołos linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT urszulamodrzyk linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT rafałlgorski linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT janwinkowski linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT michałbajczyk linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT saraszymkuc linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT bartoszagrzybowski linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections AT maciejeder linguisticmeasuresofchemicaldiversityandthekeywordsofmolecularcollections
_version_	1718395671659151360

Linguistic measures of chemical diversity and the “keywords” of molecular collections

Ejemplares similares