Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques

In the past decades, some desert wetlands have become critical regions for dust production in the arid and semi-arid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental v...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Zohre Ebrahimi-Khusfi, Ali Reza Nafarzadegan, Fatemeh Dargahian
Formato: article
Lenguaje:EN
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://doaj.org/article/39ed9e7b457048f5801bb7c97af014ab
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:39ed9e7b457048f5801bb7c97af014ab
record_format dspace
spelling oai:doaj.org-article:39ed9e7b457048f5801bb7c97af014ab2021-12-01T04:47:24ZPredicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques1470-160X10.1016/j.ecolind.2021.107499https://doaj.org/article/39ed9e7b457048f5801bb7c97af014ab2021-06-01T00:00:00Zhttp://www.sciencedirect.com/science/article/pii/S1470160X21001643https://doaj.org/toc/1470-160XIn the past decades, some desert wetlands have become critical regions for dust production in the arid and semi-arid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental variables is machine learning (ML). Although it has received more attention for spatial prediction, it has received less attention for the temporal prediction of these variables. This work is the first effort to predict NDDs in the major source of dust production in southeastern Iran using ML models and different feature selection (FS) techniques. For this purpose, monthly data of 21 predictor variables related to the study period (1988–2017) was used to predict the target variable (NDDs). The main aim was to evaluate the support vector machine (SVM), conditional inference random forest (CRF), and stochastic gradient boosting (SGB) models based on three FS algorithms, including Boruta, multivariate adaptive regression splines (MARS), and recursive feature elimination (RFE) techniques in predicting NDDs around the Hamoun wetlands. After analyzing the collinearity effect and removing the independent variables with a Tolerance < 0.11, the best attributes were selected to train the SVM, SGB, and CRF models. All datasets were randomly classified into training (70%) and verification (30%) sets. The performance of models was evaluated based on the determination coefficient (R-square), root mean square error (RMSE), mean absolute error (MAE), and Nash Sutcliffe efficiency (NSE) coefficient related to holdout data. The results indicated that SGB-MARS, SGB-RFE, and SGB-Boruta outperformed other models with different FS techniques, in terms of R2 (0.9), RMSE (2.5), MAE (1.9), and NSE (0.9). Furthermore, surface winds speed, maximum air temperature, relative humidity, wetland dried bed, and erosive winds frequency were detected as the most important factors for predicting NDDs in the study area. This study encourages us to use the SGB model with various FS techniques to predict NDDs around the desert wetlands. These results can help decision-makers reduce the risks of dust emission and increase the safety of residents around the desert wetlands.Zohre Ebrahimi-KhusfiAli Reza NafarzadeganFatemeh DargahianElsevierarticleDust emissionsBorutaMARSRecursive feature eliminationMulticollinearityStochastic gradient boostingEcologyQH540-549.5ENEcological Indicators, Vol 125, Iss , Pp 107499- (2021)
institution DOAJ
collection DOAJ
language EN
topic Dust emissions
Boruta
MARS
Recursive feature elimination
Multicollinearity
Stochastic gradient boosting
Ecology
QH540-549.5
spellingShingle Dust emissions
Boruta
MARS
Recursive feature elimination
Multicollinearity
Stochastic gradient boosting
Ecology
QH540-549.5
Zohre Ebrahimi-Khusfi
Ali Reza Nafarzadegan
Fatemeh Dargahian
Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques
description In the past decades, some desert wetlands have become critical regions for dust production in the arid and semi-arid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental variables is machine learning (ML). Although it has received more attention for spatial prediction, it has received less attention for the temporal prediction of these variables. This work is the first effort to predict NDDs in the major source of dust production in southeastern Iran using ML models and different feature selection (FS) techniques. For this purpose, monthly data of 21 predictor variables related to the study period (1988–2017) was used to predict the target variable (NDDs). The main aim was to evaluate the support vector machine (SVM), conditional inference random forest (CRF), and stochastic gradient boosting (SGB) models based on three FS algorithms, including Boruta, multivariate adaptive regression splines (MARS), and recursive feature elimination (RFE) techniques in predicting NDDs around the Hamoun wetlands. After analyzing the collinearity effect and removing the independent variables with a Tolerance < 0.11, the best attributes were selected to train the SVM, SGB, and CRF models. All datasets were randomly classified into training (70%) and verification (30%) sets. The performance of models was evaluated based on the determination coefficient (R-square), root mean square error (RMSE), mean absolute error (MAE), and Nash Sutcliffe efficiency (NSE) coefficient related to holdout data. The results indicated that SGB-MARS, SGB-RFE, and SGB-Boruta outperformed other models with different FS techniques, in terms of R2 (0.9), RMSE (2.5), MAE (1.9), and NSE (0.9). Furthermore, surface winds speed, maximum air temperature, relative humidity, wetland dried bed, and erosive winds frequency were detected as the most important factors for predicting NDDs in the study area. This study encourages us to use the SGB model with various FS techniques to predict NDDs around the desert wetlands. These results can help decision-makers reduce the risks of dust emission and increase the safety of residents around the desert wetlands.
format article
author Zohre Ebrahimi-Khusfi
Ali Reza Nafarzadegan
Fatemeh Dargahian
author_facet Zohre Ebrahimi-Khusfi
Ali Reza Nafarzadegan
Fatemeh Dargahian
author_sort Zohre Ebrahimi-Khusfi
title Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques
title_short Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques
title_full Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques
title_fullStr Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques
title_full_unstemmed Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques
title_sort predicting the number of dusty days around the desert wetlands in southeastern iran using feature selection and machine learning techniques
publisher Elsevier
publishDate 2021
url https://doaj.org/article/39ed9e7b457048f5801bb7c97af014ab
work_keys_str_mv AT zohreebrahimikhusfi predictingthenumberofdustydaysaroundthedesertwetlandsinsoutheasterniranusingfeatureselectionandmachinelearningtechniques
AT alirezanafarzadegan predictingthenumberofdustydaysaroundthedesertwetlandsinsoutheasterniranusingfeatureselectionandmachinelearningtechniques
AT fatemehdargahian predictingthenumberofdustydaysaroundthedesertwetlandsinsoutheasterniranusingfeatureselectionandmachinelearningtechniques
_version_ 1718405770651893760