Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal

Web page classification is critical for information retrieval. Most web page classification methods have the following two faults: (1) need to analyze based on the overall web page and (2) do not pay enough attention to the existence of noise information inside the web page, which will thus decrease...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Xiang Song, Yi Zhu, Xuemei Zeng, Xingshu Chen
Formato: article
Lenguaje:EN
Publicado: Hindawi-Wiley 2021
Materias:
Acceso en línea:https://doaj.org/article/a3a5ecc31c5942758d466230f88da098
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a3a5ecc31c5942758d466230f88da098
record_format dspace
spelling oai:doaj.org-article:a3a5ecc31c5942758d466230f88da0982021-11-29T00:55:36ZHierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal1939-012210.1155/2021/2470897https://doaj.org/article/a3a5ecc31c5942758d466230f88da0982021-01-01T00:00:00Zhttp://dx.doi.org/10.1155/2021/2470897https://doaj.org/toc/1939-0122Web page classification is critical for information retrieval. Most web page classification methods have the following two faults: (1) need to analyze based on the overall web page and (2) do not pay enough attention to the existence of noise information inside the web page, which will thus decrease the efficiency and classification performance, especially when classifying the contaminated web page. To solve these problems, this paper proposes a denoising disposal algorithm. We choose the top-down method for hierarchical classification to improve the prediction efficiency. The experimental results demonstrate that our method is about 7 times faster than the full-page method and achieves good classification results in most categories. The precision of 7 parent categories is all above 88% and is 24% higher than the other meta tag-based method on average.Xiang SongYi ZhuXuemei ZengXingshu ChenHindawi-WileyarticleTechnology (General)T1-995Science (General)Q1-390ENSecurity and Communication Networks, Vol 2021 (2021)
institution DOAJ
collection DOAJ
language EN
topic Technology (General)
T1-995
Science (General)
Q1-390
spellingShingle Technology (General)
T1-995
Science (General)
Q1-390
Xiang Song
Yi Zhu
Xuemei Zeng
Xingshu Chen
Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal
description Web page classification is critical for information retrieval. Most web page classification methods have the following two faults: (1) need to analyze based on the overall web page and (2) do not pay enough attention to the existence of noise information inside the web page, which will thus decrease the efficiency and classification performance, especially when classifying the contaminated web page. To solve these problems, this paper proposes a denoising disposal algorithm. We choose the top-down method for hierarchical classification to improve the prediction efficiency. The experimental results demonstrate that our method is about 7 times faster than the full-page method and achieves good classification results in most categories. The precision of 7 parent categories is all above 88% and is 24% higher than the other meta tag-based method on average.
format article
author Xiang Song
Yi Zhu
Xuemei Zeng
Xingshu Chen
author_facet Xiang Song
Yi Zhu
Xuemei Zeng
Xingshu Chen
author_sort Xiang Song
title Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal
title_short Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal
title_full Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal
title_fullStr Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal
title_full_unstemmed Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal
title_sort hierarchical contaminated web page classification based on meta tag denoising disposal
publisher Hindawi-Wiley
publishDate 2021
url https://doaj.org/article/a3a5ecc31c5942758d466230f88da098
work_keys_str_mv AT xiangsong hierarchicalcontaminatedwebpageclassificationbasedonmetatagdenoisingdisposal
AT yizhu hierarchicalcontaminatedwebpageclassificationbasedonmetatagdenoisingdisposal
AT xuemeizeng hierarchicalcontaminatedwebpageclassificationbasedonmetatagdenoisingdisposal
AT xingshuchen hierarchicalcontaminatedwebpageclassificationbasedonmetatagdenoisingdisposal
_version_ 1718407788058640384