PARSEME-It: an Italian corpus annotated with verbal multiword expressions

The paper describes the PARSEME-It corpus, developed within the PARSEME-It project which aims at the development of methods, tools and resources for multiword expressions (MWE) processing for the Italian language. The project is a spin-off of a larger multilingual project for more than 20 languages...

Full description

Saved in:

Bibliographic Details
Main Authors:	Johanna Monti, Maria Pia di Buono
Format:	article
Language:	EN
Published:	Accademia University Press 2019
Subjects:	Social Sciences H Computational linguistics. Natural language processing P98-98.5
Online Access:	https://doaj.org/article/2d9980bb59b445c5ac9fc2a1c3d63a7d
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:doaj.org-article:2d9980bb59b445c5ac9fc2a1c3d63a7d
record_format	dspace
spelling	oai:doaj.org-article:2d9980bb59b445c5ac9fc2a1c3d63a7d2021-12-02T09:52:25ZPARSEME-It: an Italian corpus annotated with verbal multiword expressions2499-455310.4000/ijcol.483https://doaj.org/article/2d9980bb59b445c5ac9fc2a1c3d63a7d2019-12-01T00:00:00Zhttp://journals.openedition.org/ijcol/483https://doaj.org/toc/2499-4553The paper describes the PARSEME-It corpus, developed within the PARSEME-It project which aims at the development of methods, tools and resources for multiword expressions (MWE) processing for the Italian language. The project is a spin-off of a larger multilingual project for more than 20 languages from several language families, namely the PARSEME COST Action. The first phase of the project was devoted to verbal multiword expressions (VMWEs). They are a particularly interesting lexical phenomenon because of frequent discontinuity and long-distance dependency. Besides they are very challenging for deep parsing and other Natural Language Processing (NLP) tasks. Notably, MWEs are pervasive in natural languages but are particularly difficult to be handled by NLP tools because of their characteristics and idiomaticity. They pose many challenges to their correct identification and processing: they are a linguistic phenomenon on the edge between lexicon and grammar, their meaning is not simply the addition of the meanings of the single constituents of the MWEs and they are ambiguous since in several cases their reading can be literal or idiomatic. Although several studies have been devoted to this topic, to the best of our knowledge, our study is the first attempt to provide a general framework for the identification of VMWEs in running texts and a comprehensive corpus for the Italian language.Johanna MontiMaria Pia di BuonoAccademia University PressarticleSocial SciencesHComputational linguistics. Natural language processingP98-98.5ENIJCoL, Vol 5, Iss 2, Pp 61-93 (2019)
institution	DOAJ
collection	DOAJ
language	EN
topic	Social Sciences H Computational linguistics. Natural language processing P98-98.5
spellingShingle	Social Sciences H Computational linguistics. Natural language processing P98-98.5 Johanna Monti Maria Pia di Buono PARSEME-It: an Italian corpus annotated with verbal multiword expressions
description	The paper describes the PARSEME-It corpus, developed within the PARSEME-It project which aims at the development of methods, tools and resources for multiword expressions (MWE) processing for the Italian language. The project is a spin-off of a larger multilingual project for more than 20 languages from several language families, namely the PARSEME COST Action. The first phase of the project was devoted to verbal multiword expressions (VMWEs). They are a particularly interesting lexical phenomenon because of frequent discontinuity and long-distance dependency. Besides they are very challenging for deep parsing and other Natural Language Processing (NLP) tasks. Notably, MWEs are pervasive in natural languages but are particularly difficult to be handled by NLP tools because of their characteristics and idiomaticity. They pose many challenges to their correct identification and processing: they are a linguistic phenomenon on the edge between lexicon and grammar, their meaning is not simply the addition of the meanings of the single constituents of the MWEs and they are ambiguous since in several cases their reading can be literal or idiomatic. Although several studies have been devoted to this topic, to the best of our knowledge, our study is the first attempt to provide a general framework for the identification of VMWEs in running texts and a comprehensive corpus for the Italian language.
format	article
author	Johanna Monti Maria Pia di Buono
author_facet	Johanna Monti Maria Pia di Buono
author_sort	Johanna Monti
title	PARSEME-It: an Italian corpus annotated with verbal multiword expressions
title_short	PARSEME-It: an Italian corpus annotated with verbal multiword expressions
title_full	PARSEME-It: an Italian corpus annotated with verbal multiword expressions
title_fullStr	PARSEME-It: an Italian corpus annotated with verbal multiword expressions
title_full_unstemmed	PARSEME-It: an Italian corpus annotated with verbal multiword expressions
title_sort	parseme-it: an italian corpus annotated with verbal multiword expressions
publisher	Accademia University Press
publishDate	2019
url	https://doaj.org/article/2d9980bb59b445c5ac9fc2a1c3d63a7d
work_keys_str_mv	AT johannamonti parsemeitanitaliancorpusannotatedwithverbalmultiwordexpressions AT mariapiadibuono parsemeitanitaliancorpusannotatedwithverbalmultiwordexpressions
_version_	1718397935838822400

PARSEME-It: an Italian corpus annotated with verbal multiword expressions

Similar Items