Gene name errors: Lessons not learned.

Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Mandhri Abeysooriya, Megan Soria, Mary Sravya Kasu, Mark Ziemann
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/8b429b7b5d454a8c9aeef110207e3418
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:8b429b7b5d454a8c9aeef110207e3418
record_format dspace
spelling oai:doaj.org-article:8b429b7b5d454a8c9aeef110207e34182021-12-02T19:57:30ZGene name errors: Lessons not learned.1553-734X1553-735810.1371/journal.pcbi.1008984https://doaj.org/article/8b429b7b5d454a8c9aeef110207e34182021-07-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1008984https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data.Mandhri AbeysooriyaMegan SoriaMary Sravya KasuMark ZiemannPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 7, p e1008984 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Mandhri Abeysooriya
Megan Soria
Mary Sravya Kasu
Mark Ziemann
Gene name errors: Lessons not learned.
description Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data.
format article
author Mandhri Abeysooriya
Megan Soria
Mary Sravya Kasu
Mark Ziemann
author_facet Mandhri Abeysooriya
Megan Soria
Mary Sravya Kasu
Mark Ziemann
author_sort Mandhri Abeysooriya
title Gene name errors: Lessons not learned.
title_short Gene name errors: Lessons not learned.
title_full Gene name errors: Lessons not learned.
title_fullStr Gene name errors: Lessons not learned.
title_full_unstemmed Gene name errors: Lessons not learned.
title_sort gene name errors: lessons not learned.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/8b429b7b5d454a8c9aeef110207e3418
work_keys_str_mv AT mandhriabeysooriya genenameerrorslessonsnotlearned
AT megansoria genenameerrorslessonsnotlearned
AT marysravyakasu genenameerrorslessonsnotlearned
AT markziemann genenameerrorslessonsnotlearned
_version_ 1718375850911465472