An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics
With rapid urbanization, awareness of environmental pollution is growing rapidly and, accordingly, interest in environmental sensors that measure atmospheric and indoor air quality is increasing. Since these IoT-based environmental sensors are sensitive and value reliability, it is essential to deal...
Guardado en:
Autores principales: | , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/f2eff07a07814bb78470a1c93081f821 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:f2eff07a07814bb78470a1c93081f821 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:f2eff07a07814bb78470a1c93081f8212021-11-25T18:57:44ZAn Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics10.3390/s212275951424-8220https://doaj.org/article/f2eff07a07814bb78470a1c93081f8212021-11-01T00:00:00Zhttps://www.mdpi.com/1424-8220/21/22/7595https://doaj.org/toc/1424-8220With rapid urbanization, awareness of environmental pollution is growing rapidly and, accordingly, interest in environmental sensors that measure atmospheric and indoor air quality is increasing. Since these IoT-based environmental sensors are sensitive and value reliability, it is essential to deal with missing values, which are one of the causes of reliability problems. Characteristics that can be used to impute missing values in environmental sensors are the time dependency of single variables and the correlation between multivariate variables. However, in the existing method of imputing missing values, only one characteristic has been used and there has been no case where both characteristics were used. In this work, we introduced a new ensemble imputation method reflecting this. First, the cases in which missing values occur frequently were divided into four cases and were generated into the experimental data: communication error (aperiodic, periodic), sensor error (rapid change, measurement range). To compare the existing method with the proposed method, five methods of univariate imputation and five methods of multivariate imputation—both of which are widely used—were used as a single model to predict missing values for the four cases. The values predicted by a single model were applied to the ensemble method. Among the ensemble methods, the weighted average and stacking methods were used to derive the final predicted values and replace the missing values. Finally, the predicted values, substituted with the original data, were evaluated by a comparison between the mean absolute error (MAE) and the root mean square error (RMSE). The proposed ensemble method generally performed better than the single method. In addition, this method simultaneously considers the correlation between variables and time dependence, which are characteristics that must be considered in the environmental sensor. As a result, our proposed ensemble technique can contribute to the replacement of the missing values generated by environmental sensors, which can help to increase the reliability of environmental sensor data.Chanyoung ChoiHaewoong JungJaehyuk ChoMDPI AGarticlemissing dataenvironmental sensorunivariate and multivariate imputationmachine learningensemble methodChemical technologyTP1-1185ENSensors, Vol 21, Iss 7595, p 7595 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
missing data environmental sensor univariate and multivariate imputation machine learning ensemble method Chemical technology TP1-1185 |
spellingShingle |
missing data environmental sensor univariate and multivariate imputation machine learning ensemble method Chemical technology TP1-1185 Chanyoung Choi Haewoong Jung Jaehyuk Cho An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics |
description |
With rapid urbanization, awareness of environmental pollution is growing rapidly and, accordingly, interest in environmental sensors that measure atmospheric and indoor air quality is increasing. Since these IoT-based environmental sensors are sensitive and value reliability, it is essential to deal with missing values, which are one of the causes of reliability problems. Characteristics that can be used to impute missing values in environmental sensors are the time dependency of single variables and the correlation between multivariate variables. However, in the existing method of imputing missing values, only one characteristic has been used and there has been no case where both characteristics were used. In this work, we introduced a new ensemble imputation method reflecting this. First, the cases in which missing values occur frequently were divided into four cases and were generated into the experimental data: communication error (aperiodic, periodic), sensor error (rapid change, measurement range). To compare the existing method with the proposed method, five methods of univariate imputation and five methods of multivariate imputation—both of which are widely used—were used as a single model to predict missing values for the four cases. The values predicted by a single model were applied to the ensemble method. Among the ensemble methods, the weighted average and stacking methods were used to derive the final predicted values and replace the missing values. Finally, the predicted values, substituted with the original data, were evaluated by a comparison between the mean absolute error (MAE) and the root mean square error (RMSE). The proposed ensemble method generally performed better than the single method. In addition, this method simultaneously considers the correlation between variables and time dependence, which are characteristics that must be considered in the environmental sensor. As a result, our proposed ensemble technique can contribute to the replacement of the missing values generated by environmental sensors, which can help to increase the reliability of environmental sensor data. |
format |
article |
author |
Chanyoung Choi Haewoong Jung Jaehyuk Cho |
author_facet |
Chanyoung Choi Haewoong Jung Jaehyuk Cho |
author_sort |
Chanyoung Choi |
title |
An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics |
title_short |
An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics |
title_full |
An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics |
title_fullStr |
An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics |
title_full_unstemmed |
An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics |
title_sort |
ensemble method for missing data of environmental sensor considering univariate and multivariate characteristics |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/f2eff07a07814bb78470a1c93081f821 |
work_keys_str_mv |
AT chanyoungchoi anensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics AT haewoongjung anensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics AT jaehyukcho anensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics AT chanyoungchoi ensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics AT haewoongjung ensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics AT jaehyukcho ensemblemethodformissingdataofenvironmentalsensorconsideringunivariateandmultivariatecharacteristics |
_version_ |
1718410500759355392 |