A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification

To effectively prevent river water pollution, water quality monitoring is necessary. However, existing methods for water quality assessment are limited in terms of the characterization of water quality conditions, and few researchers have been able to focus on feature extraction methods relative to...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Pingjie Huang, Lixiang Wang, Dibo Hou, Wangli Lin, Jie Yu, Guangxin Zhang, Hongjian Zhang
Formato: article
Lenguaje:EN
Publicado: IWA Publishing 2021
Materias:
Acceso en línea:https://doaj.org/article/418d89fc682146e1ab47fb37e17cbc35
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:418d89fc682146e1ab47fb37e17cbc35
record_format dspace
spelling oai:doaj.org-article:418d89fc682146e1ab47fb37e17cbc352021-11-05T17:51:25ZA feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification1464-71411465-173410.2166/hydro.2021.060https://doaj.org/article/418d89fc682146e1ab47fb37e17cbc352021-09-01T00:00:00Zhttp://jh.iwaponline.com/content/23/5/1050https://doaj.org/toc/1464-7141https://doaj.org/toc/1465-1734To effectively prevent river water pollution, water quality monitoring is necessary. However, existing methods for water quality assessment are limited in terms of the characterization of water quality conditions, and few researchers have been able to focus on feature extraction methods relative to water pollution identification, or to obtain accurate water pollution source information. Thus, this study proposed a feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree (GBDT) algorithm for identifying the type of surface water pollution in consideration of the distribution characteristics and intrinsic association of conventional water quality indicators. To improve the robustness to noise, we constructed the coarse-grained discretization features of each water quality index based on information entropy. The nonlinear correlation between water quality indexes and pollution classes was excavated by the GBDT algorithm, which was utilized to acquire tree transformed features. Water samples collected by a southern city Environmental Monitoring Center were used to test the performance of the proposed algorithm. Experimental results demonstrate that features extracted by the proposed method are more effective than the water quality indicators without feature engineering and features extracted by the principal component analysis algorithm. HIGHLIGHTS Different water pollutions have unique attributes for risk characterization.; Based on our study of the characteristics of water quality data, we proposed an innovative feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree algorithm.; We focus on the research into the feature extraction method in water pollution identification.;Pingjie HuangLixiang WangDibo HouWangli LinJie YuGuangxin ZhangHongjian ZhangIWA Publishingarticleentropy–minimal description length principlefeature extractiongradient boosting decision treepollution identificationsurface water qualityInformation technologyT58.5-58.64Environmental technology. Sanitary engineeringTD1-1066ENJournal of Hydroinformatics, Vol 23, Iss 5, Pp 1050-1065 (2021)
institution DOAJ
collection DOAJ
language EN
topic entropy–minimal description length principle
feature extraction
gradient boosting decision tree
pollution identification
surface water quality
Information technology
T58.5-58.64
Environmental technology. Sanitary engineering
TD1-1066
spellingShingle entropy–minimal description length principle
feature extraction
gradient boosting decision tree
pollution identification
surface water quality
Information technology
T58.5-58.64
Environmental technology. Sanitary engineering
TD1-1066
Pingjie Huang
Lixiang Wang
Dibo Hou
Wangli Lin
Jie Yu
Guangxin Zhang
Hongjian Zhang
A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification
description To effectively prevent river water pollution, water quality monitoring is necessary. However, existing methods for water quality assessment are limited in terms of the characterization of water quality conditions, and few researchers have been able to focus on feature extraction methods relative to water pollution identification, or to obtain accurate water pollution source information. Thus, this study proposed a feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree (GBDT) algorithm for identifying the type of surface water pollution in consideration of the distribution characteristics and intrinsic association of conventional water quality indicators. To improve the robustness to noise, we constructed the coarse-grained discretization features of each water quality index based on information entropy. The nonlinear correlation between water quality indexes and pollution classes was excavated by the GBDT algorithm, which was utilized to acquire tree transformed features. Water samples collected by a southern city Environmental Monitoring Center were used to test the performance of the proposed algorithm. Experimental results demonstrate that features extracted by the proposed method are more effective than the water quality indicators without feature engineering and features extracted by the principal component analysis algorithm. HIGHLIGHTS Different water pollutions have unique attributes for risk characterization.; Based on our study of the characteristics of water quality data, we proposed an innovative feature extraction method based on the entropy-minimal description length principle and gradient boosting decision tree algorithm.; We focus on the research into the feature extraction method in water pollution identification.;
format article
author Pingjie Huang
Lixiang Wang
Dibo Hou
Wangli Lin
Jie Yu
Guangxin Zhang
Hongjian Zhang
author_facet Pingjie Huang
Lixiang Wang
Dibo Hou
Wangli Lin
Jie Yu
Guangxin Zhang
Hongjian Zhang
author_sort Pingjie Huang
title A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification
title_short A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification
title_full A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification
title_fullStr A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification
title_full_unstemmed A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification
title_sort feature extraction method based on the entropy-minimal description length principle and gbdt for common surface water pollution identification
publisher IWA Publishing
publishDate 2021
url https://doaj.org/article/418d89fc682146e1ab47fb37e17cbc35
work_keys_str_mv AT pingjiehuang afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT lixiangwang afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT dibohou afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT wanglilin afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT jieyu afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT guangxinzhang afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT hongjianzhang afeatureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT pingjiehuang featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT lixiangwang featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT dibohou featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT wanglilin featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT jieyu featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT guangxinzhang featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
AT hongjianzhang featureextractionmethodbasedontheentropyminimaldescriptionlengthprincipleandgbdtforcommonsurfacewaterpollutionidentification
_version_ 1718444097145929728