Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal
Web page classification is critical for information retrieval. Most web page classification methods have the following two faults: (1) need to analyze based on the overall web page and (2) do not pay enough attention to the existence of noise information inside the web page, which will thus decrease...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Hindawi-Wiley
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/a3a5ecc31c5942758d466230f88da098 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:a3a5ecc31c5942758d466230f88da098 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:a3a5ecc31c5942758d466230f88da0982021-11-29T00:55:36ZHierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal1939-012210.1155/2021/2470897https://doaj.org/article/a3a5ecc31c5942758d466230f88da0982021-01-01T00:00:00Zhttp://dx.doi.org/10.1155/2021/2470897https://doaj.org/toc/1939-0122Web page classification is critical for information retrieval. Most web page classification methods have the following two faults: (1) need to analyze based on the overall web page and (2) do not pay enough attention to the existence of noise information inside the web page, which will thus decrease the efficiency and classification performance, especially when classifying the contaminated web page. To solve these problems, this paper proposes a denoising disposal algorithm. We choose the top-down method for hierarchical classification to improve the prediction efficiency. The experimental results demonstrate that our method is about 7 times faster than the full-page method and achieves good classification results in most categories. The precision of 7 parent categories is all above 88% and is 24% higher than the other meta tag-based method on average.Xiang SongYi ZhuXuemei ZengXingshu ChenHindawi-WileyarticleTechnology (General)T1-995Science (General)Q1-390ENSecurity and Communication Networks, Vol 2021 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Technology (General) T1-995 Science (General) Q1-390 |
spellingShingle |
Technology (General) T1-995 Science (General) Q1-390 Xiang Song Yi Zhu Xuemei Zeng Xingshu Chen Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal |
description |
Web page classification is critical for information retrieval. Most web page classification methods have the following two faults: (1) need to analyze based on the overall web page and (2) do not pay enough attention to the existence of noise information inside the web page, which will thus decrease the efficiency and classification performance, especially when classifying the contaminated web page. To solve these problems, this paper proposes a denoising disposal algorithm. We choose the top-down method for hierarchical classification to improve the prediction efficiency. The experimental results demonstrate that our method is about 7 times faster than the full-page method and achieves good classification results in most categories. The precision of 7 parent categories is all above 88% and is 24% higher than the other meta tag-based method on average. |
format |
article |
author |
Xiang Song Yi Zhu Xuemei Zeng Xingshu Chen |
author_facet |
Xiang Song Yi Zhu Xuemei Zeng Xingshu Chen |
author_sort |
Xiang Song |
title |
Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal |
title_short |
Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal |
title_full |
Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal |
title_fullStr |
Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal |
title_full_unstemmed |
Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal |
title_sort |
hierarchical contaminated web page classification based on meta tag denoising disposal |
publisher |
Hindawi-Wiley |
publishDate |
2021 |
url |
https://doaj.org/article/a3a5ecc31c5942758d466230f88da098 |
work_keys_str_mv |
AT xiangsong hierarchicalcontaminatedwebpageclassificationbasedonmetatagdenoisingdisposal AT yizhu hierarchicalcontaminatedwebpageclassificationbasedonmetatagdenoisingdisposal AT xuemeizeng hierarchicalcontaminatedwebpageclassificationbasedonmetatagdenoisingdisposal AT xingshuchen hierarchicalcontaminatedwebpageclassificationbasedonmetatagdenoisingdisposal |
_version_ |
1718407788058640384 |