Procedimiento semi-automático para transformar la web en web semántica

The concept of Semantic Web requires a formal representation of information according to reference ontologies that equip the Web with semantics for computer systems. There is a widespread agreement that this is done by standard labeling languages. But it also requires that there be enough semantic a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Criado Fernández, Luis
Otros Autores: Martínez Tomás, Rafael (UNED. Universidad Nacional de Educación a Distancia (España))
Formato: text (thesis)
Lenguaje:spa
Publicado: UNED. Universidad Nacional de Educación a Distancia (España) (España) 2009
Materias:
OWL
Acceso en línea:https://dialnet.unirioja.es/servlet/oaites?codigo=20625
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:The concept of Semantic Web requires a formal representation of information according to reference ontologies that equip the Web with semantics for computer systems. There is a widespread agreement that this is done by standard labeling languages. But it also requires that there be enough semantic annotations of this kind, a certain "critical mass" is necessary so it can have a global meaning on the Web. And this has not been achieved primarily because of the complexity that arises by performing the annotation manually. Only when we have the ability to generate enough semantic annotations, either automatically or semi-automatically, the semantics can be extended in the contents of the Web. From this situation it will be already possible to develop applications that benefit from or take advantage of those semantics, the semantic applications. And our research is focused, more particularly towards this problem. Thereby, the main specific contribution of this thesis is the proposal of a procedure to assist in extending the population of ontologies, which makes easier for an active user the semantic labeling of the information it manages, and that has been already described in text on the HTML page, according to the ontology or ontologies that the system has identified as most relevant to its contents. In our work, this latter possibility is very taken into account; the content to be labeled can refer to different topics or can be interpreted from different points of view, what we call in this work generating different "semantic views." But also, a semantic web site should be compatible with the current web, i.e. the annotation process should not affect the present operation of any search engine. Thus, when transforming a website into a semantic website, semantic features will be obtained that may be exploited by a semantic search engine, but when it is treated by a regular browser there will be full compatibility and the regular search engine will treat it just as another website. Also, in this thesis, this requirement has been taken into account, the semantic views are kept distinct from the HTML page, accessible but without affecting the regular search engines. We have defined some transformation stages that must be carried out sequentially. The first one, that we call identification allows associating the ontology or ontologies that are closer to the content of the web page. This selection of ontologies is crucial so in the next stage, that we call extraction, text is processed at morphological and syntactic level. Finally, the last stage, that we have called interpretation is responsible for semantic annotation. The annotation is done in our study in OWL DL as it is the standard language for describing semantics in the Web and it allows the inferences typical of the descriptive logic SROID (D) in which it is based. In the development, the used methodology is based on simplifying the problem without losing the conceptual category to encompass the full scope of the proposal, consisting of a sequence of processes taking place throughout the thesis. That is, it has posed a simplified scenario that recreates the key elements of the current Web to propose a migration strategy or transformation towards the Semantic Web. The conclusions reached are the result of an experimental self-correction process. We have fully implemented the proposal of this thesis that can be verified by any researcher following the guidelines in the annex of the thesis. To perform this transformation or migration, we have implemented a prototype tool (sw2sws) that automates the three stages we have presented. It has been tested on real websites. Our prototype tool automates the process of annotation with ontologies used in the thesis, but is easily adaptable to support others. Furthermore, our approach accepts the possibility of user intervention (semi-automatic process) to complete or improve any stage of the overall process. The quality of the annotation obtained depends on several factors, such as the very quality of the ontology with respect to the one that annotates (affinity, accuracy, standardization, completeness, etc.), the clarity of contents and the ability of extraction and analysis, conditioned, to a large extent, to natural language processing (NLP). This thesis is not intended to solve the problem of NLP for the annotation; however, to test the process, we have made a small NLP module that allows showing the feasibility for active users, users that participate in the contents and are inexperienced in the techniques of Semantic Web. Once the main objective has been achieved, to show how to exploit this information that already has semantics and close all the sequence of the process, we have seen the need to design and implement an own prototype of semantic search engine, which we have called Vissem, able to interpret questions in natural language and carry out corresponding searches on instances of the semantic websites we have created.