Gene name errors: Lessons not learned.
Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/8b429b7b5d454a8c9aeef110207e3418 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:8b429b7b5d454a8c9aeef110207e3418 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:8b429b7b5d454a8c9aeef110207e34182021-12-02T19:57:30ZGene name errors: Lessons not learned.1553-734X1553-735810.1371/journal.pcbi.1008984https://doaj.org/article/8b429b7b5d454a8c9aeef110207e34182021-07-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1008984https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data.Mandhri AbeysooriyaMegan SoriaMary Sravya KasuMark ZiemannPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 7, p e1008984 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Biology (General) QH301-705.5 |
spellingShingle |
Biology (General) QH301-705.5 Mandhri Abeysooriya Megan Soria Mary Sravya Kasu Mark Ziemann Gene name errors: Lessons not learned. |
description |
Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data. |
format |
article |
author |
Mandhri Abeysooriya Megan Soria Mary Sravya Kasu Mark Ziemann |
author_facet |
Mandhri Abeysooriya Megan Soria Mary Sravya Kasu Mark Ziemann |
author_sort |
Mandhri Abeysooriya |
title |
Gene name errors: Lessons not learned. |
title_short |
Gene name errors: Lessons not learned. |
title_full |
Gene name errors: Lessons not learned. |
title_fullStr |
Gene name errors: Lessons not learned. |
title_full_unstemmed |
Gene name errors: Lessons not learned. |
title_sort |
gene name errors: lessons not learned. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2021 |
url |
https://doaj.org/article/8b429b7b5d454a8c9aeef110207e3418 |
work_keys_str_mv |
AT mandhriabeysooriya genenameerrorslessonsnotlearned AT megansoria genenameerrorslessonsnotlearned AT marysravyakasu genenameerrorslessonsnotlearned AT markziemann genenameerrorslessonsnotlearned |
_version_ |
1718375850911465472 |