Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques
In the past decades, some desert wetlands have become critical regions for dust production in the arid and semi-arid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental v...
Guardado en:
Autores principales: | , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/39ed9e7b457048f5801bb7c97af014ab |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:39ed9e7b457048f5801bb7c97af014ab |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:39ed9e7b457048f5801bb7c97af014ab2021-12-01T04:47:24ZPredicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques1470-160X10.1016/j.ecolind.2021.107499https://doaj.org/article/39ed9e7b457048f5801bb7c97af014ab2021-06-01T00:00:00Zhttp://www.sciencedirect.com/science/article/pii/S1470160X21001643https://doaj.org/toc/1470-160XIn the past decades, some desert wetlands have become critical regions for dust production in the arid and semi-arid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental variables is machine learning (ML). Although it has received more attention for spatial prediction, it has received less attention for the temporal prediction of these variables. This work is the first effort to predict NDDs in the major source of dust production in southeastern Iran using ML models and different feature selection (FS) techniques. For this purpose, monthly data of 21 predictor variables related to the study period (1988–2017) was used to predict the target variable (NDDs). The main aim was to evaluate the support vector machine (SVM), conditional inference random forest (CRF), and stochastic gradient boosting (SGB) models based on three FS algorithms, including Boruta, multivariate adaptive regression splines (MARS), and recursive feature elimination (RFE) techniques in predicting NDDs around the Hamoun wetlands. After analyzing the collinearity effect and removing the independent variables with a Tolerance < 0.11, the best attributes were selected to train the SVM, SGB, and CRF models. All datasets were randomly classified into training (70%) and verification (30%) sets. The performance of models was evaluated based on the determination coefficient (R-square), root mean square error (RMSE), mean absolute error (MAE), and Nash Sutcliffe efficiency (NSE) coefficient related to holdout data. The results indicated that SGB-MARS, SGB-RFE, and SGB-Boruta outperformed other models with different FS techniques, in terms of R2 (0.9), RMSE (2.5), MAE (1.9), and NSE (0.9). Furthermore, surface winds speed, maximum air temperature, relative humidity, wetland dried bed, and erosive winds frequency were detected as the most important factors for predicting NDDs in the study area. This study encourages us to use the SGB model with various FS techniques to predict NDDs around the desert wetlands. These results can help decision-makers reduce the risks of dust emission and increase the safety of residents around the desert wetlands.Zohre Ebrahimi-KhusfiAli Reza NafarzadeganFatemeh DargahianElsevierarticleDust emissionsBorutaMARSRecursive feature eliminationMulticollinearityStochastic gradient boostingEcologyQH540-549.5ENEcological Indicators, Vol 125, Iss , Pp 107499- (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Dust emissions Boruta MARS Recursive feature elimination Multicollinearity Stochastic gradient boosting Ecology QH540-549.5 |
spellingShingle |
Dust emissions Boruta MARS Recursive feature elimination Multicollinearity Stochastic gradient boosting Ecology QH540-549.5 Zohre Ebrahimi-Khusfi Ali Reza Nafarzadegan Fatemeh Dargahian Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques |
description |
In the past decades, some desert wetlands have become critical regions for dust production in the arid and semi-arid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental variables is machine learning (ML). Although it has received more attention for spatial prediction, it has received less attention for the temporal prediction of these variables. This work is the first effort to predict NDDs in the major source of dust production in southeastern Iran using ML models and different feature selection (FS) techniques. For this purpose, monthly data of 21 predictor variables related to the study period (1988–2017) was used to predict the target variable (NDDs). The main aim was to evaluate the support vector machine (SVM), conditional inference random forest (CRF), and stochastic gradient boosting (SGB) models based on three FS algorithms, including Boruta, multivariate adaptive regression splines (MARS), and recursive feature elimination (RFE) techniques in predicting NDDs around the Hamoun wetlands. After analyzing the collinearity effect and removing the independent variables with a Tolerance < 0.11, the best attributes were selected to train the SVM, SGB, and CRF models. All datasets were randomly classified into training (70%) and verification (30%) sets. The performance of models was evaluated based on the determination coefficient (R-square), root mean square error (RMSE), mean absolute error (MAE), and Nash Sutcliffe efficiency (NSE) coefficient related to holdout data. The results indicated that SGB-MARS, SGB-RFE, and SGB-Boruta outperformed other models with different FS techniques, in terms of R2 (0.9), RMSE (2.5), MAE (1.9), and NSE (0.9). Furthermore, surface winds speed, maximum air temperature, relative humidity, wetland dried bed, and erosive winds frequency were detected as the most important factors for predicting NDDs in the study area. This study encourages us to use the SGB model with various FS techniques to predict NDDs around the desert wetlands. These results can help decision-makers reduce the risks of dust emission and increase the safety of residents around the desert wetlands. |
format |
article |
author |
Zohre Ebrahimi-Khusfi Ali Reza Nafarzadegan Fatemeh Dargahian |
author_facet |
Zohre Ebrahimi-Khusfi Ali Reza Nafarzadegan Fatemeh Dargahian |
author_sort |
Zohre Ebrahimi-Khusfi |
title |
Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques |
title_short |
Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques |
title_full |
Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques |
title_fullStr |
Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques |
title_full_unstemmed |
Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques |
title_sort |
predicting the number of dusty days around the desert wetlands in southeastern iran using feature selection and machine learning techniques |
publisher |
Elsevier |
publishDate |
2021 |
url |
https://doaj.org/article/39ed9e7b457048f5801bb7c97af014ab |
work_keys_str_mv |
AT zohreebrahimikhusfi predictingthenumberofdustydaysaroundthedesertwetlandsinsoutheasterniranusingfeatureselectionandmachinelearningtechniques AT alirezanafarzadegan predictingthenumberofdustydaysaroundthedesertwetlandsinsoutheasterniranusingfeatureselectionandmachinelearningtechniques AT fatemehdargahian predictingthenumberofdustydaysaroundthedesertwetlandsinsoutheasterniranusingfeatureselectionandmachinelearningtechniques |
_version_ |
1718405770651893760 |