Identification of related multilingual documents using ant clustering algorithms

This paper presents a document representation strategy and a bio-inspired algorithm to cluster multilingual collections of documents in the field of economics and business. The proposed approach allows the user to identify groups of related economics documents written in Spanish and English using te...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Cobo,Ángel, Rocha,Rocío
Lenguaje:English
Publicado: Universidad de Tarapacá. 2011
Materias:
Acceso en línea:http://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0718-33052011000300005
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:scielo:S0718-33052011000300005
record_format dspace
spelling oai:scielo:S0718-330520110003000052012-03-14Identification of related multilingual documents using ant clustering algorithmsCobo,ÁngelRocha,Rocío Clustering ant-based algorithms multilingual documents text mining document management This paper presents a document representation strategy and a bio-inspired algorithm to cluster multilingual collections of documents in the field of economics and business. The proposed approach allows the user to identify groups of related economics documents written in Spanish and English using techniques inspired on clustering and sorting behaviours observed in some types of ants. In order to obtain a language independent vector representation of each document two multilingual resources are used: an economic glossary and a thesaurus. Each document is represented using four feature vectors: words, proper names, economic terms in the glossary and thesaurus descriptors. The proper name identification, word extraction and lemmatization are performed using specific tools. The tf-idf scheme is used to measure the importance of each feature in the document, and a convex linear combination of angular separations between feature vectors is used as similarity measure of documents. The paper shows experimental results of the application of the proposed algorithm in a Spanish-English corpus of research papers in economics and management areas. The results demonstrate the usefulness and effectiveness of the ant clustering algorithm and the proposed representation scheme.info:eu-repo/semantics/openAccessUniversidad de Tarapacá.Ingeniare. Revista chilena de ingeniería v.19 n.3 20112011-12-01text/htmlhttp://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0718-33052011000300005en10.4067/S0718-33052011000300005
institution Scielo Chile
collection Scielo Chile
language English
topic Clustering
ant-based algorithms
multilingual documents
text mining
document management
spellingShingle Clustering
ant-based algorithms
multilingual documents
text mining
document management
Cobo,Ángel
Rocha,Rocío
Identification of related multilingual documents using ant clustering algorithms
description This paper presents a document representation strategy and a bio-inspired algorithm to cluster multilingual collections of documents in the field of economics and business. The proposed approach allows the user to identify groups of related economics documents written in Spanish and English using techniques inspired on clustering and sorting behaviours observed in some types of ants. In order to obtain a language independent vector representation of each document two multilingual resources are used: an economic glossary and a thesaurus. Each document is represented using four feature vectors: words, proper names, economic terms in the glossary and thesaurus descriptors. The proper name identification, word extraction and lemmatization are performed using specific tools. The tf-idf scheme is used to measure the importance of each feature in the document, and a convex linear combination of angular separations between feature vectors is used as similarity measure of documents. The paper shows experimental results of the application of the proposed algorithm in a Spanish-English corpus of research papers in economics and management areas. The results demonstrate the usefulness and effectiveness of the ant clustering algorithm and the proposed representation scheme.
author Cobo,Ángel
Rocha,Rocío
author_facet Cobo,Ángel
Rocha,Rocío
author_sort Cobo,Ángel
title Identification of related multilingual documents using ant clustering algorithms
title_short Identification of related multilingual documents using ant clustering algorithms
title_full Identification of related multilingual documents using ant clustering algorithms
title_fullStr Identification of related multilingual documents using ant clustering algorithms
title_full_unstemmed Identification of related multilingual documents using ant clustering algorithms
title_sort identification of related multilingual documents using ant clustering algorithms
publisher Universidad de Tarapacá.
publishDate 2011
url http://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0718-33052011000300005
work_keys_str_mv AT coboangel identificationofrelatedmultilingualdocumentsusingantclusteringalgorithms
AT rocharocio identificationofrelatedmultilingualdocumentsusingantclusteringalgorithms
_version_ 1714203393071775744