A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification
To effectively prevent river water pollution, water quality monitoring is necessary. However, existing methods for water quality assessment are limited in terms of the characterization of water quality conditions, and few researchers have been able to focus on feature extraction methods relative to...
Guardado en:
Autores principales: | , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
IWA Publishing
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/418d89fc682146e1ab47fb37e17cbc35 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:418d89fc682146e1ab47fb37e17cbc35 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:418d89fc682146e1ab47fb37e17cbc352021-11-05T17:51:25ZA feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification1464-71411465-173410.2166/hydro.2021.060https://doaj.org/article/418d89fc682146e1ab47fb37e17cbc352021-09-01T00:00:00Zhttp://jh.iwaponline.com/content/23/5/1050https://doaj.org/toc/1464-7141https://doaj.org/toc/1465-1734To effectively prevent river water pollution, water quality monitoring is necessary. However, existing methods for water quality assessment are limited in terms of the characterization of water quality conditions, and few researchers have been able to focus on feature extraction methods relative to water pollution identification, or to obtain accurate water pollution source information. Thus, this study proposed a feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree (GBDT) algorithm for identifying the type of surface water pollution in consideration of the distribution characteristics and intrinsic association of conventional water quality indicators. To improve the robustness to noise, we constructed the coarse-grained discretization features of each water quality index based on information entropy. The nonlinear correlation between water quality indexes and pollution classes was excavated by the GBDT algorithm, which was utilized to acquire tree transformed features. Water samples collected by a southern city Environmental Monitoring Center were used to test the performance of the proposed algorithm. Experimental results demonstrate that features extracted by the proposed method are more effective than the water quality indicators without feature engineering and features extracted by the principal component analysis algorithm. HIGHLIGHTS Different water pollutions have unique attributes for risk characterization.; Based on our study of the characteristics of water quality data, we proposed an innovative feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree algorithm.; We focus on the research into the feature extraction method in water pollution identification.;Pingjie HuangLixiang WangDibo HouWangli LinJie YuGuangxin ZhangHongjian ZhangIWA Publishingarticleentropy–minimal description length principlefeature extractiongradient boosting decision treepollution identificationsurface water qualityInformation technologyT58.5-58.64Environmental technology. Sanitary engineeringTD1-1066ENJournal of Hydroinformatics, Vol 23, Iss 5, Pp 1050-1065 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
entropy–minimal description length principle feature extraction gradient boosting decision tree pollution identification surface water quality Information technology T58.5-58.64 Environmental technology. Sanitary engineering TD1-1066 |
spellingShingle |
entropy–minimal description length principle feature extraction gradient boosting decision tree pollution identification surface water quality Information technology T58.5-58.64 Environmental technology. Sanitary engineering TD1-1066 Pingjie Huang Lixiang Wang Dibo Hou Wangli Lin Jie Yu Guangxin Zhang Hongjian Zhang A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification |
description |
To effectively prevent river water pollution, water quality monitoring is necessary. However, existing methods for water quality assessment are limited in terms of the characterization of water quality conditions, and few researchers have been able to focus on feature extraction methods relative to water pollution identification, or to obtain accurate water pollution source information. Thus, this study proposed a feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree (GBDT) algorithm for identifying the type of surface water pollution in consideration of the distribution characteristics and intrinsic association of conventional water quality indicators. To improve the robustness to noise, we constructed the coarse-grained discretization features of each water quality index based on information entropy. The nonlinear correlation between water quality indexes and pollution classes was excavated by the GBDT algorithm, which was utilized to acquire tree transformed features. Water samples collected by a southern city Environmental Monitoring Center were used to test the performance of the proposed algorithm. Experimental results demonstrate that features extracted by the proposed method are more effective than the water quality indicators without feature engineering and features extracted by the principal component analysis algorithm. HIGHLIGHTS
Different water pollutions have unique attributes for risk characterization.;
Based on our study of the characteristics of water quality data, we proposed an innovative feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree algorithm.;
We focus on the research into the feature extraction method in water pollution identification.; |
format |
article |
author |
Pingjie Huang Lixiang Wang Dibo Hou Wangli Lin Jie Yu Guangxin Zhang Hongjian Zhang |
author_facet |
Pingjie Huang Lixiang Wang Dibo Hou Wangli Lin Jie Yu Guangxin Zhang Hongjian Zhang |
author_sort |
Pingjie Huang |
title |
A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification |
title_short |
A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification |
title_full |
A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification |
title_fullStr |
A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification |
title_full_unstemmed |
A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification |
title_sort |
feature extraction method based on the entropy-minimal description length principle and gbdt for common surface water pollution identification |
publisher |
IWA Publishing |
publishDate |
2021 |
url |
https://doaj.org/article/418d89fc682146e1ab47fb37e17cbc35 |
work_keys_str_mv |
AT pingjiehuang afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT lixiangwang afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT dibohou afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT wanglilin afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT jieyu afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT guangxinzhang afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT hongjianzhang afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT pingjiehuang featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT lixiangwang featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT dibohou featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT wanglilin featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT jieyu featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT guangxinzhang featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification AT hongjianzhang featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification |
_version_ |
1718444097145929728 |