A comparative study of evaluating missing value imputation methods in label-free proteomics

Abstract The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparativ...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Liang Jin, Yingtao Bi, Chenqi Hu, Jun Qu, Shichen Shen, Xue Wang, Yu Tian
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/198d77c776934e20ab2d016e4f427453
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:198d77c776934e20ab2d016e4f427453
record_format dspace
spelling oai:doaj.org-article:198d77c776934e20ab2d016e4f4274532021-12-02T13:57:05ZA comparative study of evaluating missing value imputation methods in label-free proteomics10.1038/s41598-021-81279-42045-2322https://doaj.org/article/198d77c776934e20ab2d016e4f4274532021-01-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-81279-4https://doaj.org/toc/2045-2322Abstract The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparative study that evaluates the performance of seven popular imputation methods with a large-scale benchmark dataset and an immune cell dataset. Simulated MVs were incorporated into the complete part of each dataset with different combinations of MV rates and missing not at random (MNAR) rates. Normalized root mean square error (NRMSE) was applied to evaluate the accuracy of protein abundances and intergroup protein ratios after imputation. Detection of true positives (TPs) and false altered-protein discovery rate (FADR) between groups were also compared using the benchmark dataset. Furthermore, the accuracy of handling real MVs was assessed by comparing enriched pathways and signature genes of cell activation after imputing the immune cell dataset. We observed that the accuracy of imputation is primarily affected by the MNAR rate rather than the MV rate, and downstream analysis can be largely impacted by the selection of imputation methods. A random forest-based imputation method consistently outperformed other popular methods by achieving the lowest NRMSE, high amount of TPs with the average FADR < 5%, and the best detection of relevant pathways and signature genes, highlighting it as the most suitable method for label-free proteomics.Liang JinYingtao BiChenqi HuJun QuShichen ShenXue WangYu TianNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-11 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Liang Jin
Yingtao Bi
Chenqi Hu
Jun Qu
Shichen Shen
Xue Wang
Yu Tian
A comparative study of evaluating missing value imputation methods in label-free proteomics
description Abstract The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparative study that evaluates the performance of seven popular imputation methods with a large-scale benchmark dataset and an immune cell dataset. Simulated MVs were incorporated into the complete part of each dataset with different combinations of MV rates and missing not at random (MNAR) rates. Normalized root mean square error (NRMSE) was applied to evaluate the accuracy of protein abundances and intergroup protein ratios after imputation. Detection of true positives (TPs) and false altered-protein discovery rate (FADR) between groups were also compared using the benchmark dataset. Furthermore, the accuracy of handling real MVs was assessed by comparing enriched pathways and signature genes of cell activation after imputing the immune cell dataset. We observed that the accuracy of imputation is primarily affected by the MNAR rate rather than the MV rate, and downstream analysis can be largely impacted by the selection of imputation methods. A random forest-based imputation method consistently outperformed other popular methods by achieving the lowest NRMSE, high amount of TPs with the average FADR < 5%, and the best detection of relevant pathways and signature genes, highlighting it as the most suitable method for label-free proteomics.
format article
author Liang Jin
Yingtao Bi
Chenqi Hu
Jun Qu
Shichen Shen
Xue Wang
Yu Tian
author_facet Liang Jin
Yingtao Bi
Chenqi Hu
Jun Qu
Shichen Shen
Xue Wang
Yu Tian
author_sort Liang Jin
title A comparative study of evaluating missing value imputation methods in label-free proteomics
title_short A comparative study of evaluating missing value imputation methods in label-free proteomics
title_full A comparative study of evaluating missing value imputation methods in label-free proteomics
title_fullStr A comparative study of evaluating missing value imputation methods in label-free proteomics
title_full_unstemmed A comparative study of evaluating missing value imputation methods in label-free proteomics
title_sort comparative study of evaluating missing value imputation methods in label-free proteomics
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/198d77c776934e20ab2d016e4f427453
work_keys_str_mv AT liangjin acomparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT yingtaobi acomparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT chenqihu acomparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT junqu acomparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT shichenshen acomparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT xuewang acomparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT yutian acomparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT liangjin comparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT yingtaobi comparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT chenqihu comparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT junqu comparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT shichenshen comparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT xuewang comparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
AT yutian comparativestudyofevaluatingmissingvalueimputationmethodsinlabelfreeproteomics
_version_ 1718392306941296640