A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION

Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representati...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Fatima-Zahra El-Alami, Abdelkader El Mahdaouy, Said Ouatik El Alaoui, Noureddine En-Nahnahi
Formato: article
Lenguaje:EN
Publicado: UUM Press 2020
Materias:
Acceso en línea:https://doaj.org/article/33a784cd229f41e08d0ebc707b02e5b4
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:33a784cd229f41e08d0ebc707b02e5b4
record_format dspace
spelling oai:doaj.org-article:33a784cd229f41e08d0ebc707b02e5b42021-11-15T04:08:07ZA DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION10.32890/jict2020.19.3.41675-414X2180-3862https://doaj.org/article/33a784cd229f41e08d0ebc707b02e5b42020-06-01T00:00:00Zhttp://e-journal.uum.edu.my/index.php/jict/article/view/jict2020.19.3.4https://doaj.org/toc/1675-414Xhttps://doaj.org/toc/2180-3862Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones. Fatima-Zahra El-AlamiAbdelkader El MahdaouySaid Ouatik El AlaouiNoureddine En-NahnahiUUM Pressarticlearabic text representationdeep autoencoderfeature selectionmachine learningtext categorizationInformation technologyT58.5-58.64ENJournal of ICT, Vol 19, Iss 3, Pp 381-398 (2020)
institution DOAJ
collection DOAJ
language EN
topic arabic text representation
deep autoencoder
feature selection
machine learning
text categorization
Information technology
T58.5-58.64
spellingShingle arabic text representation
deep autoencoder
feature selection
machine learning
text categorization
Information technology
T58.5-58.64
Fatima-Zahra El-Alami
Abdelkader El Mahdaouy
Said Ouatik El Alaoui
Noureddine En-Nahnahi
A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
description Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones.
format article
author Fatima-Zahra El-Alami
Abdelkader El Mahdaouy
Said Ouatik El Alaoui
Noureddine En-Nahnahi
author_facet Fatima-Zahra El-Alami
Abdelkader El Mahdaouy
Said Ouatik El Alaoui
Noureddine En-Nahnahi
author_sort Fatima-Zahra El-Alami
title A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_short A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_full A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_fullStr A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_full_unstemmed A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
title_sort deep autoencoder-based representation for arabic text categorization
publisher UUM Press
publishDate 2020
url https://doaj.org/article/33a784cd229f41e08d0ebc707b02e5b4
work_keys_str_mv AT fatimazahraelalami adeepautoencoderbasedrepresentationforarabictextcategorization
AT abdelkaderelmahdaouy adeepautoencoderbasedrepresentationforarabictextcategorization
AT saidouatikelalaoui adeepautoencoderbasedrepresentationforarabictextcategorization
AT noureddineennahnahi adeepautoencoderbasedrepresentationforarabictextcategorization
AT fatimazahraelalami deepautoencoderbasedrepresentationforarabictextcategorization
AT abdelkaderelmahdaouy deepautoencoderbasedrepresentationforarabictextcategorization
AT saidouatikelalaoui deepautoencoderbasedrepresentationforarabictextcategorization
AT noureddineennahnahi deepautoencoderbasedrepresentationforarabictextcategorization
_version_ 1718428857906757632