Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic

Abstract Background Poor data quality is limiting the use of data sourced from routine health information systems (RHIS), especially in low- and middle-income countries. An important component of this data quality issue comes from missing values, where health facilities, for a variety of reasons, fa...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Shuo Feng, Celestin Hategeka, Karen Ann Grépin
Formato:	article
Lenguaje:	EN
Publicado:	BMC 2021
Materias:	Missing data Routine health information systems (RHIS) Health management information system (HMIS) Health services research Low- and middle-income countries (LMICs) Multiple imputation Computer applications to medicine. Medical informatics R858-859.7 Public aspects of medicine RA1-1270
Acceso en línea:	https://doaj.org/article/d3a35b234ffb4c9b9c2e82128fa9708c
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:d3a35b234ffb4c9b9c2e82128fa9708c
record_format	dspace
spelling	oai:doaj.org-article:d3a35b234ffb4c9b9c2e82128fa9708c2021-11-07T12:10:10ZAddressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic10.1186/s12963-021-00274-z1478-7954https://doaj.org/article/d3a35b234ffb4c9b9c2e82128fa9708c2021-11-01T00:00:00Zhttps://doi.org/10.1186/s12963-021-00274-zhttps://doaj.org/toc/1478-7954Abstract Background Poor data quality is limiting the use of data sourced from routine health information systems (RHIS), especially in low- and middle-income countries. An important component of this data quality issue comes from missing values, where health facilities, for a variety of reasons, fail to report to the central system. Methods Using data from the health management information system in the Democratic Republic of the Congo and the advent of COVID-19 pandemic as an illustrative case study, we implemented seven commonly used imputation methods and evaluated their performance in terms of minimizing bias in imputed values and parameter estimates generated through subsequent analytical techniques, namely segmented regression, which is widely used in interrupted time series studies, and pre–post-comparisons through paired Wilcoxon rank-sum tests. We also examined the performance of these imputation methods under different missing mechanisms and tested their stability to changes in the data. Results For regression analyses, there were no substantial differences found in the coefficient estimates generated from all methods except mean imputation and exclusion and interpolation when the data contained less than 20% missing values. However, as the missing proportion grew, k-NN started to produce biased estimates. Machine learning algorithms, i.e. missForest and k-NN, were also found to lack robustness to small changes in the data or consecutive missingness. On the other hand, multiple imputation methods generated the overall most unbiased estimates and were the most robust to all changes in data. They also produced smaller standard errors than single imputations. For pre–post-comparisons, all methods produced p values less than 0.01, regardless of the amount of missingness introduced, suggesting low sensitivity of Wilcoxon rank-sum tests to the imputation method used. Conclusions We recommend the use of multiple imputation in addressing missing values in RHIS datasets and appropriate handling of data structure to minimize imputation standard errors. In cases where necessary computing resources are unavailable for multiple imputation, one may consider seasonal decomposition as the next best method. Mean imputation and exclusion and interpolation, however, always produced biased and misleading results in the subsequent analyses, and thus, their use in the handling of missing values should be discouraged.Shuo FengCelestin HategekaKaren Ann GrépinBMCarticleMissing dataRoutine health information systems (RHIS)Health management information system (HMIS)Health services researchLow- and middle-income countries (LMICs)Multiple imputationComputer applications to medicine. Medical informaticsR858-859.7Public aspects of medicineRA1-1270ENPopulation Health Metrics, Vol 19, Iss 1, Pp 1-14 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Missing data Routine health information systems (RHIS) Health management information system (HMIS) Health services research Low- and middle-income countries (LMICs) Multiple imputation Computer applications to medicine. Medical informatics R858-859.7 Public aspects of medicine RA1-1270
spellingShingle	Missing data Routine health information systems (RHIS) Health management information system (HMIS) Health services research Low- and middle-income countries (LMICs) Multiple imputation Computer applications to medicine. Medical informatics R858-859.7 Public aspects of medicine RA1-1270 Shuo Feng Celestin Hategeka Karen Ann Grépin Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic
description	Abstract Background Poor data quality is limiting the use of data sourced from routine health information systems (RHIS), especially in low- and middle-income countries. An important component of this data quality issue comes from missing values, where health facilities, for a variety of reasons, fail to report to the central system. Methods Using data from the health management information system in the Democratic Republic of the Congo and the advent of COVID-19 pandemic as an illustrative case study, we implemented seven commonly used imputation methods and evaluated their performance in terms of minimizing bias in imputed values and parameter estimates generated through subsequent analytical techniques, namely segmented regression, which is widely used in interrupted time series studies, and pre–post-comparisons through paired Wilcoxon rank-sum tests. We also examined the performance of these imputation methods under different missing mechanisms and tested their stability to changes in the data. Results For regression analyses, there were no substantial differences found in the coefficient estimates generated from all methods except mean imputation and exclusion and interpolation when the data contained less than 20% missing values. However, as the missing proportion grew, k-NN started to produce biased estimates. Machine learning algorithms, i.e. missForest and k-NN, were also found to lack robustness to small changes in the data or consecutive missingness. On the other hand, multiple imputation methods generated the overall most unbiased estimates and were the most robust to all changes in data. They also produced smaller standard errors than single imputations. For pre–post-comparisons, all methods produced p values less than 0.01, regardless of the amount of missingness introduced, suggesting low sensitivity of Wilcoxon rank-sum tests to the imputation method used. Conclusions We recommend the use of multiple imputation in addressing missing values in RHIS datasets and appropriate handling of data structure to minimize imputation standard errors. In cases where necessary computing resources are unavailable for multiple imputation, one may consider seasonal decomposition as the next best method. Mean imputation and exclusion and interpolation, however, always produced biased and misleading results in the subsequent analyses, and thus, their use in the handling of missing values should be discouraged.
format	article
author	Shuo Feng Celestin Hategeka Karen Ann Grépin
author_facet	Shuo Feng Celestin Hategeka Karen Ann Grépin
author_sort	Shuo Feng
title	Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic
title_short	Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic
title_full	Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic
title_fullStr	Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic
title_full_unstemmed	Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic
title_sort	addressing missing values in routine health information system data: an evaluation of imputation methods using data from the democratic republic of the congo during the covid-19 pandemic
publisher	BMC
publishDate	2021
url	https://doaj.org/article/d3a35b234ffb4c9b9c2e82128fa9708c
work_keys_str_mv	AT shuofeng addressingmissingvaluesinroutinehealthinformationsystemdataanevaluationofimputationmethodsusingdatafromthedemocraticrepublicofthecongoduringthecovid19pandemic AT celestinhategeka addressingmissingvaluesinroutinehealthinformationsystemdataanevaluationofimputationmethodsusingdatafromthedemocraticrepublicofthecongoduringthecovid19pandemic AT karenanngrepin addressingmissingvaluesinroutinehealthinformationsystemdataanevaluationofimputationmethodsusingdatafromthedemocraticrepublicofthecongoduringthecovid19pandemic
_version_	1718443517569662976

Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic

Ejemplares similares