TWITTIRÒ: an Italian Twitter Corpus with a Multi-layered Annotation for Irony

Provided the difficulties that still affect a correct identification of irony within the context of Sentiment Analysis tasks, in this paper we describe the main issues emerged during the development of a novel resource for Italian annotated for irony. The project mainly consists in the application o...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Alessandra Teresa Cignarella, Cristina Bosco, Viviana Patti, Mirko Lai
Formato: article
Lenguaje:EN
Publicado: Accademia University Press 2018
Materias:
H
Acceso en línea:https://doaj.org/article/abce63f8e2e048babae28facf8e9e684
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:abce63f8e2e048babae28facf8e9e684
record_format dspace
spelling oai:doaj.org-article:abce63f8e2e048babae28facf8e9e6842021-12-02T09:52:31ZTWITTIRÒ: an Italian Twitter Corpus with a Multi-layered Annotation for Irony2499-455310.4000/ijcol.502https://doaj.org/article/abce63f8e2e048babae28facf8e9e6842018-12-01T00:00:00Zhttp://journals.openedition.org/ijcol/502https://doaj.org/toc/2499-4553Provided the difficulties that still affect a correct identification of irony within the context of Sentiment Analysis tasks, in this paper we describe the main issues emerged during the development of a novel resource for Italian annotated for irony. The project mainly consists in the application on the Twitter corpus TWITTIRÒ of a multi-layered scheme for the fine-grained annotation of irony, as proposed in a multilingual setting and previously applied also on French and English datasets (Karoui et al. 2017). In applying the annotation on this corpus, we outline and discuss the issues and peculiarities emerged about the exploitation of the semantic scheme for Twitter textual messages in Italian, thus shedding some lights on the future directions that can be followed in the multilingual and cross-language perspective too. We present, in particular, an analysis of the annotation process and distribution of the labels of each layer involved in the scheme. This is supported by a discussion of the outcome of the annotation carried on by native Italian speakers in the development of the corpus. In particular, an in-depth discussion of the inter-annotator agreement and of the sources of disagreement is included. The result is a novel gold standard corpus for irony detection in Italian, which enriches the scenario of multilingual datasets available for this challenging task and is ready to be used as a benchmark in automatic irony detection experiments and evaluation campaigns.Alessandra Teresa CignarellaCristina BoscoViviana PattiMirko LaiAccademia University PressarticleSocial SciencesHComputational linguistics. Natural language processingP98-98.5ENIJCoL, Vol 4, Iss 2, Pp 25-43 (2018)
institution DOAJ
collection DOAJ
language EN
topic Social Sciences
H
Computational linguistics. Natural language processing
P98-98.5
spellingShingle Social Sciences
H
Computational linguistics. Natural language processing
P98-98.5
Alessandra Teresa Cignarella
Cristina Bosco
Viviana Patti
Mirko Lai
TWITTIRÒ: an Italian Twitter Corpus with a Multi-layered Annotation for Irony
description Provided the difficulties that still affect a correct identification of irony within the context of Sentiment Analysis tasks, in this paper we describe the main issues emerged during the development of a novel resource for Italian annotated for irony. The project mainly consists in the application on the Twitter corpus TWITTIRÒ of a multi-layered scheme for the fine-grained annotation of irony, as proposed in a multilingual setting and previously applied also on French and English datasets (Karoui et al. 2017). In applying the annotation on this corpus, we outline and discuss the issues and peculiarities emerged about the exploitation of the semantic scheme for Twitter textual messages in Italian, thus shedding some lights on the future directions that can be followed in the multilingual and cross-language perspective too. We present, in particular, an analysis of the annotation process and distribution of the labels of each layer involved in the scheme. This is supported by a discussion of the outcome of the annotation carried on by native Italian speakers in the development of the corpus. In particular, an in-depth discussion of the inter-annotator agreement and of the sources of disagreement is included. The result is a novel gold standard corpus for irony detection in Italian, which enriches the scenario of multilingual datasets available for this challenging task and is ready to be used as a benchmark in automatic irony detection experiments and evaluation campaigns.
format article
author Alessandra Teresa Cignarella
Cristina Bosco
Viviana Patti
Mirko Lai
author_facet Alessandra Teresa Cignarella
Cristina Bosco
Viviana Patti
Mirko Lai
author_sort Alessandra Teresa Cignarella
title TWITTIRÒ: an Italian Twitter Corpus with a Multi-layered Annotation for Irony
title_short TWITTIRÒ: an Italian Twitter Corpus with a Multi-layered Annotation for Irony
title_full TWITTIRÒ: an Italian Twitter Corpus with a Multi-layered Annotation for Irony
title_fullStr TWITTIRÒ: an Italian Twitter Corpus with a Multi-layered Annotation for Irony
title_full_unstemmed TWITTIRÒ: an Italian Twitter Corpus with a Multi-layered Annotation for Irony
title_sort twittirò: an italian twitter corpus with a multi-layered annotation for irony
publisher Accademia University Press
publishDate 2018
url https://doaj.org/article/abce63f8e2e048babae28facf8e9e684
work_keys_str_mv AT alessandrateresacignarella twittiroanitaliantwittercorpuswithamultilayeredannotationforirony
AT cristinabosco twittiroanitaliantwittercorpuswithamultilayeredannotationforirony
AT vivianapatti twittiroanitaliantwittercorpuswithamultilayeredannotationforirony
AT mirkolai twittiroanitaliantwittercorpuswithamultilayeredannotationforirony
_version_ 1718397962981212160