Machine-learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets

<p>Running large-eddy simulations (LESs) can be burdensome and computationally too expensive from the application point of view, for example, to support urban planning. In this study, regression models are used to replicate modelled air pollutant concentrations from LES in urban boulevards. We...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: M. Lange, H. Suominen, M. Kurppa, L. Järvi, E. Oikarinen, R. Savvides, K. Puolamäki
Formato: article
Lenguaje:EN
Publicado: Copernicus Publications 2021
Materias:
Acceso en línea:https://doaj.org/article/685f97ccf9e645e79107d4184f19efa7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:685f97ccf9e645e79107d4184f19efa7
record_format dspace
spelling oai:doaj.org-article:685f97ccf9e645e79107d4184f19efa72021-12-02T14:48:14ZMachine-learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets10.5194/gmd-14-7411-20211991-959X1991-9603https://doaj.org/article/685f97ccf9e645e79107d4184f19efa72021-12-01T00:00:00Zhttps://gmd.copernicus.org/articles/14/7411/2021/gmd-14-7411-2021.pdfhttps://doaj.org/toc/1991-959Xhttps://doaj.org/toc/1991-9603<p>Running large-eddy simulations (LESs) can be burdensome and computationally too expensive from the application point of view, for example, to support urban planning. In this study, regression models are used to replicate modelled air pollutant concentrations from LES in urban boulevards. We study the performance of regression models and discuss how to detect situations where the models are applied outside their training domain and their outputs cannot be trusted. Regression models from 10 different model families are trained and a cross-validation methodology is used to evaluate their performance and to find the best set of features needed to reproduce the LES outputs. We also test the regression models on an independent testing dataset. Our results suggest that in general, log-linear regression gives the best and most robust performance on new independent data. It clearly outperforms the dummy model which would predict constant concentrations for all locations (multiplicative minimum RMSE (mRMSE) of <span class="inline-formula">0.76</span> vs. <span class="inline-formula">1.78</span> of the dummy model). Furthermore, we demonstrate that it is possible to detect concept drift, i.e. situations where the model is applied outside its training domain and a new LES run may be necessary to obtain reliable results. Regression models can be used to replace LES simulations in estimating air pollutant concentrations, unless higher accuracy is needed. In order to have reliable results, it is however important to do the model and feature selection carefully to avoid overfitting and to use methods to detect the concept drift.</p>M. LangeH. SuominenM. KurppaL. JärviL. JärviE. OikarinenR. SavvidesK. PuolamäkiK. PuolamäkiCopernicus PublicationsarticleGeologyQE1-996.5ENGeoscientific Model Development, Vol 14, Pp 7411-7424 (2021)
institution DOAJ
collection DOAJ
language EN
topic Geology
QE1-996.5
spellingShingle Geology
QE1-996.5
M. Lange
H. Suominen
M. Kurppa
L. Järvi
L. Järvi
E. Oikarinen
R. Savvides
K. Puolamäki
K. Puolamäki
Machine-learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets
description <p>Running large-eddy simulations (LESs) can be burdensome and computationally too expensive from the application point of view, for example, to support urban planning. In this study, regression models are used to replicate modelled air pollutant concentrations from LES in urban boulevards. We study the performance of regression models and discuss how to detect situations where the models are applied outside their training domain and their outputs cannot be trusted. Regression models from 10 different model families are trained and a cross-validation methodology is used to evaluate their performance and to find the best set of features needed to reproduce the LES outputs. We also test the regression models on an independent testing dataset. Our results suggest that in general, log-linear regression gives the best and most robust performance on new independent data. It clearly outperforms the dummy model which would predict constant concentrations for all locations (multiplicative minimum RMSE (mRMSE) of <span class="inline-formula">0.76</span> vs. <span class="inline-formula">1.78</span> of the dummy model). Furthermore, we demonstrate that it is possible to detect concept drift, i.e. situations where the model is applied outside its training domain and a new LES run may be necessary to obtain reliable results. Regression models can be used to replace LES simulations in estimating air pollutant concentrations, unless higher accuracy is needed. In order to have reliable results, it is however important to do the model and feature selection carefully to avoid overfitting and to use methods to detect the concept drift.</p>
format article
author M. Lange
H. Suominen
M. Kurppa
L. Järvi
L. Järvi
E. Oikarinen
R. Savvides
K. Puolamäki
K. Puolamäki
author_facet M. Lange
H. Suominen
M. Kurppa
L. Järvi
L. Järvi
E. Oikarinen
R. Savvides
K. Puolamäki
K. Puolamäki
author_sort M. Lange
title Machine-learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets
title_short Machine-learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets
title_full Machine-learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets
title_fullStr Machine-learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets
title_full_unstemmed Machine-learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets
title_sort machine-learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets
publisher Copernicus Publications
publishDate 2021
url https://doaj.org/article/685f97ccf9e645e79107d4184f19efa7
work_keys_str_mv AT mlange machinelearningmodelstoreplicatelargeeddysimulationsofairpollutantconcentrationsalongboulevardtypestreets
AT hsuominen machinelearningmodelstoreplicatelargeeddysimulationsofairpollutantconcentrationsalongboulevardtypestreets
AT mkurppa machinelearningmodelstoreplicatelargeeddysimulationsofairpollutantconcentrationsalongboulevardtypestreets
AT ljarvi machinelearningmodelstoreplicatelargeeddysimulationsofairpollutantconcentrationsalongboulevardtypestreets
AT ljarvi machinelearningmodelstoreplicatelargeeddysimulationsofairpollutantconcentrationsalongboulevardtypestreets
AT eoikarinen machinelearningmodelstoreplicatelargeeddysimulationsofairpollutantconcentrationsalongboulevardtypestreets
AT rsavvides machinelearningmodelstoreplicatelargeeddysimulationsofairpollutantconcentrationsalongboulevardtypestreets
AT kpuolamaki machinelearningmodelstoreplicatelargeeddysimulationsofairpollutantconcentrationsalongboulevardtypestreets
AT kpuolamaki machinelearningmodelstoreplicatelargeeddysimulationsofairpollutantconcentrationsalongboulevardtypestreets
_version_ 1718389514258350080