Mapping Population Distribution Based on XGBoost Using Multisource Data

Mapping fine-scale distribution of the population is essential to the study of human activities, where more reliable open-access big data could be excavated with the help of machine learning models. However, the combination of multisource datasets and multidimensional features for population estimat...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Xin Zhao, Nan Xia, Yunyun Xu, Xuefeng Huang, Manchun Li
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
Acceso en línea:https://doaj.org/article/8d4589b2cbf74680a2742075702c11bb
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:8d4589b2cbf74680a2742075702c11bb
record_format dspace
spelling oai:doaj.org-article:8d4589b2cbf74680a2742075702c11bb2021-11-24T00:00:19ZMapping Population Distribution Based on XGBoost Using Multisource Data2151-153510.1109/JSTARS.2021.3125197https://doaj.org/article/8d4589b2cbf74680a2742075702c11bb2021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9601310/https://doaj.org/toc/2151-1535Mapping fine-scale distribution of the population is essential to the study of human activities, where more reliable open-access big data could be excavated with the help of machine learning models. However, the combination of multisource datasets and multidimensional features for population estimation was still unclear, and different models also needed comparison. Thus, in this study, related features from multisource data were first extracted, including building data, geographic big data, remote sensing data, and basic geographic data. Then, the effective indicators with higher contribution weight were selected from multisource data, which can reduce the noise and unstable model fitting. Finally, the population distribution map for 100-m grid was obtained in Shenzhen in 2019, and estimation results for five tree-based ensemble learning models were also compared at community scale, including random forest (RF), gradient boosted decision tree (GBDT), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost). Our results showed that: 1) building data and geographic big data could better reflect the spatial heterogeneity of the population; 2) indicators selection could effectively improve the estimation accuracy of the population mapping; and 3) compared with other models, XGBoost had the largest <italic>R</italic><sup>2</sup> (80&#x0025;), the smallest RMSE, and MAE, the most percentage of accurate estimation communities (&#x2212;0.3&lt;RE&lt;0.3, 65&#x0025;), and a shorter train time. Therefore, XGBoost was chosen for mapping population distribution instead of GBDT, LightGBM, CatBoost, and RF. Our proposed method for population mapping can help to optimize the allocation of resources and guide a more scientific path for urban development.Xin ZhaoNan XiaYunyun XuXuefeng HuangManchun LiIEEEarticleCatBoostgradient boosted decision tree (GBDT)light gradient boosting machine (lightGBM)multisource datapopulation mappingrandom forest (RF)Ocean engineeringTC1501-1800Geophysics. Cosmic physicsQC801-809ENIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol 14, Pp 11567-11580 (2021)
institution DOAJ
collection DOAJ
language EN
topic CatBoost
gradient boosted decision tree (GBDT)
light gradient boosting machine (lightGBM)
multisource data
population mapping
random forest (RF)
Ocean engineering
TC1501-1800
Geophysics. Cosmic physics
QC801-809
spellingShingle CatBoost
gradient boosted decision tree (GBDT)
light gradient boosting machine (lightGBM)
multisource data
population mapping
random forest (RF)
Ocean engineering
TC1501-1800
Geophysics. Cosmic physics
QC801-809
Xin Zhao
Nan Xia
Yunyun Xu
Xuefeng Huang
Manchun Li
Mapping Population Distribution Based on XGBoost Using Multisource Data
description Mapping fine-scale distribution of the population is essential to the study of human activities, where more reliable open-access big data could be excavated with the help of machine learning models. However, the combination of multisource datasets and multidimensional features for population estimation was still unclear, and different models also needed comparison. Thus, in this study, related features from multisource data were first extracted, including building data, geographic big data, remote sensing data, and basic geographic data. Then, the effective indicators with higher contribution weight were selected from multisource data, which can reduce the noise and unstable model fitting. Finally, the population distribution map for 100-m grid was obtained in Shenzhen in 2019, and estimation results for five tree-based ensemble learning models were also compared at community scale, including random forest (RF), gradient boosted decision tree (GBDT), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost). Our results showed that: 1) building data and geographic big data could better reflect the spatial heterogeneity of the population; 2) indicators selection could effectively improve the estimation accuracy of the population mapping; and 3) compared with other models, XGBoost had the largest <italic>R</italic><sup>2</sup> (80&#x0025;), the smallest RMSE, and MAE, the most percentage of accurate estimation communities (&#x2212;0.3&lt;RE&lt;0.3, 65&#x0025;), and a shorter train time. Therefore, XGBoost was chosen for mapping population distribution instead of GBDT, LightGBM, CatBoost, and RF. Our proposed method for population mapping can help to optimize the allocation of resources and guide a more scientific path for urban development.
format article
author Xin Zhao
Nan Xia
Yunyun Xu
Xuefeng Huang
Manchun Li
author_facet Xin Zhao
Nan Xia
Yunyun Xu
Xuefeng Huang
Manchun Li
author_sort Xin Zhao
title Mapping Population Distribution Based on XGBoost Using Multisource Data
title_short Mapping Population Distribution Based on XGBoost Using Multisource Data
title_full Mapping Population Distribution Based on XGBoost Using Multisource Data
title_fullStr Mapping Population Distribution Based on XGBoost Using Multisource Data
title_full_unstemmed Mapping Population Distribution Based on XGBoost Using Multisource Data
title_sort mapping population distribution based on xgboost using multisource data
publisher IEEE
publishDate 2021
url https://doaj.org/article/8d4589b2cbf74680a2742075702c11bb
work_keys_str_mv AT xinzhao mappingpopulationdistributionbasedonxgboostusingmultisourcedata
AT nanxia mappingpopulationdistributionbasedonxgboostusingmultisourcedata
AT yunyunxu mappingpopulationdistributionbasedonxgboostusingmultisourcedata
AT xuefenghuang mappingpopulationdistributionbasedonxgboostusingmultisourcedata
AT manchunli mappingpopulationdistributionbasedonxgboostusingmultisourcedata
_version_ 1718416107718574080