PM<sub>2.5</sub> Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM

With the rapid development of China’s industrialization, air pollution is becoming more and more serious. Predicting air quality is essential for identifying further preventive measures to avoid negative impacts. The existing prediction of atmospheric pollutant concentration ignores the problem of f...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Hongbin Dai, Guangqiu Huang, Huibin Zeng, Fan Yang
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/b15a4deb69f54444b9e485ea191ca55d
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:b15a4deb69f54444b9e485ea191ca55d
record_format dspace
spelling oai:doaj.org-article:b15a4deb69f54444b9e485ea191ca55d2021-11-11T19:43:05ZPM<sub>2.5</sub> Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM10.3390/su1321120712071-1050https://doaj.org/article/b15a4deb69f54444b9e485ea191ca55d2021-11-01T00:00:00Zhttps://www.mdpi.com/2071-1050/13/21/12071https://doaj.org/toc/2071-1050With the rapid development of China’s industrialization, air pollution is becoming more and more serious. Predicting air quality is essential for identifying further preventive measures to avoid negative impacts. The existing prediction of atmospheric pollutant concentration ignores the problem of feature redundancy and spatio-temporal characteristics; the accuracy of the model is not high, the mobility of it is not strong. Therefore, firstly, extreme gradient lifting (XGBoost) is applied to extract features from PM<sub>2.5</sub>, then one-dimensional multi-scale convolution kernel (MSCNN) is used to extract local temporal and spatial feature relations from air quality data, and linear splicing and fusion is carried out to obtain the spatio-temporal feature relationship of multi-features. Finally, XGBoost and MSCNN combine the advantages of LSTM in dealing with time series. Genetic algorithm (GA) is applied to optimize the parameter set of long-term and short-term memory network (LSTM) network. The spatio-temporal relationship of multi-features is input into LSTM network, and then the long-term feature dependence of multi-feature selection is output to predict PM<sub>2.5</sub> concentration. A XGBoost-MSCGL of PM<sub>2.5</sub> concentration prediction model based on spatio-temporal feature selection is established. The data set comes from the hourly concentration data of six kinds of atmospheric pollutants and meteorological data in Fen-Wei Plain in 2020. To verify the effectiveness of the model, the XGBoost-MSCGL model is compared with the benchmark models such as multilayer perceptron (MLP), CNN, LSTM, XGBoost, CNN-LSTM with before and after using XGBoost feature selection. According to the forecast results of 12 cities, compared with the single model, the root mean square error (RMSE) decreased by about 39.07%, the average MAE decreased by about 42.18%, the average MAE decreased by about 49.33%, but R<sup>2</sup> increased by 23.7%. Compared with the model after feature selection, the root mean square error (RMSE) decreased by an average of about 15%. On average, the MAPE decreased by 16%, the MAE decreased by 21%, and R<sup>2</sup> increased by 2.6%. The experimental results show that the XGBoost-MSCGL prediction model offer a more comprehensive understanding, runs deeper levels, guarantees a higher prediction accuracy, and ensures a better generalization ability in the prediction of PM<sub>2.5</sub> concentration.Hongbin DaiGuangqiu HuangHuibin ZengFan YangMDPI AGarticleXGBoostMSCNNgenetic algorithmLSTMfeature selectionspatiotemporal feature extractionEnvironmental effects of industries and plantsTD194-195Renewable energy sourcesTJ807-830Environmental sciencesGE1-350ENSustainability, Vol 13, Iss 12071, p 12071 (2021)
institution DOAJ
collection DOAJ
language EN
topic XGBoost
MSCNN
genetic algorithm
LSTM
feature selection
spatiotemporal feature extraction
Environmental effects of industries and plants
TD194-195
Renewable energy sources
TJ807-830
Environmental sciences
GE1-350
spellingShingle XGBoost
MSCNN
genetic algorithm
LSTM
feature selection
spatiotemporal feature extraction
Environmental effects of industries and plants
TD194-195
Renewable energy sources
TJ807-830
Environmental sciences
GE1-350
Hongbin Dai
Guangqiu Huang
Huibin Zeng
Fan Yang
PM<sub>2.5</sub> Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM
description With the rapid development of China’s industrialization, air pollution is becoming more and more serious. Predicting air quality is essential for identifying further preventive measures to avoid negative impacts. The existing prediction of atmospheric pollutant concentration ignores the problem of feature redundancy and spatio-temporal characteristics; the accuracy of the model is not high, the mobility of it is not strong. Therefore, firstly, extreme gradient lifting (XGBoost) is applied to extract features from PM<sub>2.5</sub>, then one-dimensional multi-scale convolution kernel (MSCNN) is used to extract local temporal and spatial feature relations from air quality data, and linear splicing and fusion is carried out to obtain the spatio-temporal feature relationship of multi-features. Finally, XGBoost and MSCNN combine the advantages of LSTM in dealing with time series. Genetic algorithm (GA) is applied to optimize the parameter set of long-term and short-term memory network (LSTM) network. The spatio-temporal relationship of multi-features is input into LSTM network, and then the long-term feature dependence of multi-feature selection is output to predict PM<sub>2.5</sub> concentration. A XGBoost-MSCGL of PM<sub>2.5</sub> concentration prediction model based on spatio-temporal feature selection is established. The data set comes from the hourly concentration data of six kinds of atmospheric pollutants and meteorological data in Fen-Wei Plain in 2020. To verify the effectiveness of the model, the XGBoost-MSCGL model is compared with the benchmark models such as multilayer perceptron (MLP), CNN, LSTM, XGBoost, CNN-LSTM with before and after using XGBoost feature selection. According to the forecast results of 12 cities, compared with the single model, the root mean square error (RMSE) decreased by about 39.07%, the average MAE decreased by about 42.18%, the average MAE decreased by about 49.33%, but R<sup>2</sup> increased by 23.7%. Compared with the model after feature selection, the root mean square error (RMSE) decreased by an average of about 15%. On average, the MAPE decreased by 16%, the MAE decreased by 21%, and R<sup>2</sup> increased by 2.6%. The experimental results show that the XGBoost-MSCGL prediction model offer a more comprehensive understanding, runs deeper levels, guarantees a higher prediction accuracy, and ensures a better generalization ability in the prediction of PM<sub>2.5</sub> concentration.
format article
author Hongbin Dai
Guangqiu Huang
Huibin Zeng
Fan Yang
author_facet Hongbin Dai
Guangqiu Huang
Huibin Zeng
Fan Yang
author_sort Hongbin Dai
title PM<sub>2.5</sub> Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM
title_short PM<sub>2.5</sub> Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM
title_full PM<sub>2.5</sub> Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM
title_fullStr PM<sub>2.5</sub> Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM
title_full_unstemmed PM<sub>2.5</sub> Concentration Prediction Based on Spatiotemporal Feature Selection Using XGBoost-MSCNN-GA-LSTM
title_sort pm<sub>2.5</sub> concentration prediction based on spatiotemporal feature selection using xgboost-mscnn-ga-lstm
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/b15a4deb69f54444b9e485ea191ca55d
work_keys_str_mv AT hongbindai pmsub25subconcentrationpredictionbasedonspatiotemporalfeatureselectionusingxgboostmscnngalstm
AT guangqiuhuang pmsub25subconcentrationpredictionbasedonspatiotemporalfeatureselectionusingxgboostmscnngalstm
AT huibinzeng pmsub25subconcentrationpredictionbasedonspatiotemporalfeatureselectionusingxgboostmscnngalstm
AT fanyang pmsub25subconcentrationpredictionbasedonspatiotemporalfeatureselectionusingxgboostmscnngalstm
_version_ 1718431429298225152