Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques

In the past decades, some desert wetlands have become critical regions for dust production in the arid and semi-arid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental v...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Zohre Ebrahimi-Khusfi, Ali Reza Nafarzadegan, Fatemeh Dargahian
Formato: article
Lenguaje:EN
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://doaj.org/article/39ed9e7b457048f5801bb7c97af014ab
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:In the past decades, some desert wetlands have become critical regions for dust production in the arid and semi-arid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental variables is machine learning (ML). Although it has received more attention for spatial prediction, it has received less attention for the temporal prediction of these variables. This work is the first effort to predict NDDs in the major source of dust production in southeastern Iran using ML models and different feature selection (FS) techniques. For this purpose, monthly data of 21 predictor variables related to the study period (1988–2017) was used to predict the target variable (NDDs). The main aim was to evaluate the support vector machine (SVM), conditional inference random forest (CRF), and stochastic gradient boosting (SGB) models based on three FS algorithms, including Boruta, multivariate adaptive regression splines (MARS), and recursive feature elimination (RFE) techniques in predicting NDDs around the Hamoun wetlands. After analyzing the collinearity effect and removing the independent variables with a Tolerance < 0.11, the best attributes were selected to train the SVM, SGB, and CRF models. All datasets were randomly classified into training (70%) and verification (30%) sets. The performance of models was evaluated based on the determination coefficient (R-square), root mean square error (RMSE), mean absolute error (MAE), and Nash Sutcliffe efficiency (NSE) coefficient related to holdout data. The results indicated that SGB-MARS, SGB-RFE, and SGB-Boruta outperformed other models with different FS techniques, in terms of R2 (0.9), RMSE (2.5), MAE (1.9), and NSE (0.9). Furthermore, surface winds speed, maximum air temperature, relative humidity, wetland dried bed, and erosive winds frequency were detected as the most important factors for predicting NDDs in the study area. This study encourages us to use the SGB model with various FS techniques to predict NDDs around the desert wetlands. These results can help decision-makers reduce the risks of dust emission and increase the safety of residents around the desert wetlands.