Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports

Abstract The extraction of statistical results in scientific reports is beneficial for checking studies on plausibility and reliability. The R package JATSdecoder supports the application of text mining approaches to scientific reports. Its function get.stats() extracts all reported statistical resu...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Ingmar Böschen
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/7062ca948bec442187aae6d5c68aa527
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:7062ca948bec442187aae6d5c68aa527
record_format dspace
spelling oai:doaj.org-article:7062ca948bec442187aae6d5c68aa5272021-12-02T18:51:14ZEvaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports10.1038/s41598-021-98782-32045-2322https://doaj.org/article/7062ca948bec442187aae6d5c68aa5272021-09-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-98782-3https://doaj.org/toc/2045-2322Abstract The extraction of statistical results in scientific reports is beneficial for checking studies on plausibility and reliability. The R package JATSdecoder supports the application of text mining approaches to scientific reports. Its function get.stats() extracts all reported statistical results from text and recomputes p values for most standard test results. The output can be reduced to results with checkable or computable p values only. In this article, get.stats()’s ability to extract, recompute and check statistical results is compared to that of statcheck, which is an already established tool. A manually coded data set, containing the number of statistically significant results in 49 articles, serves as an initial indicator for get.stats()’s and statcheck’s differing detection rates for statistical results. Further 13,531 PDF files by 10 mayor psychological journals, 18,744 XML documents by Frontiers of Psychology and 23,730 articles related to psychological research and published by PLoS One are scanned for statistical results with both algorithms. get.stats() almost replicates the manually extracted number of significant results in 49 PDF articles. get.stats() outperforms the statcheck functions in identifying statistical results in every included journal and input format. Furthermore, the raw results extracted by get.stats() increase statcheck’s detection rate. JATSdecoder’s function get.stats() is a highly general and reliable tool to extract statistical results from text. It copes with a wide range of textual representations of statistical standard results and recomputes p values for two- and one-sided tests. It facilitates manual and automated checks on consistency and completeness of the reported results within a manuscript.Ingmar BöschenNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-12 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Ingmar Böschen
Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports
description Abstract The extraction of statistical results in scientific reports is beneficial for checking studies on plausibility and reliability. The R package JATSdecoder supports the application of text mining approaches to scientific reports. Its function get.stats() extracts all reported statistical results from text and recomputes p values for most standard test results. The output can be reduced to results with checkable or computable p values only. In this article, get.stats()’s ability to extract, recompute and check statistical results is compared to that of statcheck, which is an already established tool. A manually coded data set, containing the number of statistically significant results in 49 articles, serves as an initial indicator for get.stats()’s and statcheck’s differing detection rates for statistical results. Further 13,531 PDF files by 10 mayor psychological journals, 18,744 XML documents by Frontiers of Psychology and 23,730 articles related to psychological research and published by PLoS One are scanned for statistical results with both algorithms. get.stats() almost replicates the manually extracted number of significant results in 49 PDF articles. get.stats() outperforms the statcheck functions in identifying statistical results in every included journal and input format. Furthermore, the raw results extracted by get.stats() increase statcheck’s detection rate. JATSdecoder’s function get.stats() is a highly general and reliable tool to extract statistical results from text. It copes with a wide range of textual representations of statistical standard results and recomputes p values for two- and one-sided tests. It facilitates manual and automated checks on consistency and completeness of the reported results within a manuscript.
format article
author Ingmar Böschen
author_facet Ingmar Böschen
author_sort Ingmar Böschen
title Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports
title_short Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports
title_full Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports
title_fullStr Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports
title_full_unstemmed Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports
title_sort evaluation of jatsdecoder as an automated text extraction tool for statistical results in scientific reports
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/7062ca948bec442187aae6d5c68aa527
work_keys_str_mv AT ingmarboschen evaluationofjatsdecoderasanautomatedtextextractiontoolforstatisticalresultsinscientificreports
_version_ 1718377452753911808