AlBERTo: Modeling Italian Social Media Language with BERT

Natural Language Processing tasks recently achieved considerable interest and progresses following the development of numerous innovative artificial intelligence models released in recent years. The increase in available computing power has made possible the application of machine learning approache...

Full description

Saved in:

Bibliographic Details
Main Authors:	Marco Polignano, Valerio Basile, Pierpaolo Basile, Marco de Gemmis, Giovanni Semeraro
Format:	article
Language:	EN
Published:	Accademia University Press 2019
Subjects:	Social Sciences H Computational linguistics. Natural language processing P98-98.5
Online Access:	https://doaj.org/article/eb4a26589b8b4dbdab87d69f1f47c8d7
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:doaj.org-article:eb4a26589b8b4dbdab87d69f1f47c8d7
record_format	dspace
spelling	oai:doaj.org-article:eb4a26589b8b4dbdab87d69f1f47c8d72021-12-02T09:52:27ZAlBERTo: Modeling Italian Social Media Language with BERT2499-455310.4000/ijcol.472https://doaj.org/article/eb4a26589b8b4dbdab87d69f1f47c8d72019-12-01T00:00:00Zhttp://journals.openedition.org/ijcol/472https://doaj.org/toc/2499-4553Natural Language Processing tasks recently achieved considerable interest and progresses following the development of numerous innovative artificial intelligence models released in recent years. The increase in available computing power has made possible the application of machine learning approaches on a considerable amount of textual data, demonstrating how they can obtain very encouraging results in challenging NLP tasks by generalizing the properties of natural language directly from the data. Models such as ELMo, GPT/GPT-2, BERT, ERNIE, and RoBERTa have proved to be extremely useful in NLP tasks such as entailment, sentiment analysis, and question answering. The availability of these resources mainly in the English language motivated us towards the realization of AlBERTo, a natural language model based on BERT and trained on the Italian language. We decided to train AlBERTo from scratch on social network language, Twitter in particular, because many of the classic tasks of content analysis are oriented to data extracted from the digital sphere of users. The model was distributed to the community through a repository on GitHub and the Transformers library (Wolf et al. 2019) released by the development group huggingface.co. We have evaluated the validity of the model on the classification tasks of sentiment polarity, irony, subjectivity, and hate speech. The specifications of the model, the code developed for training and fine-tuning, and the instructions for using it in a research project are freely available.Marco PolignanoValerio BasilePierpaolo BasileMarco de GemmisGiovanni SemeraroAccademia University PressarticleSocial SciencesHComputational linguistics. Natural language processingP98-98.5ENIJCoL, Vol 5, Iss 2, Pp 11-31 (2019)
institution	DOAJ
collection	DOAJ
language	EN
topic	Social Sciences H Computational linguistics. Natural language processing P98-98.5
spellingShingle	Social Sciences H Computational linguistics. Natural language processing P98-98.5 Marco Polignano Valerio Basile Pierpaolo Basile Marco de Gemmis Giovanni Semeraro AlBERTo: Modeling Italian Social Media Language with BERT
description	Natural Language Processing tasks recently achieved considerable interest and progresses following the development of numerous innovative artificial intelligence models released in recent years. The increase in available computing power has made possible the application of machine learning approaches on a considerable amount of textual data, demonstrating how they can obtain very encouraging results in challenging NLP tasks by generalizing the properties of natural language directly from the data. Models such as ELMo, GPT/GPT-2, BERT, ERNIE, and RoBERTa have proved to be extremely useful in NLP tasks such as entailment, sentiment analysis, and question answering. The availability of these resources mainly in the English language motivated us towards the realization of AlBERTo, a natural language model based on BERT and trained on the Italian language. We decided to train AlBERTo from scratch on social network language, Twitter in particular, because many of the classic tasks of content analysis are oriented to data extracted from the digital sphere of users. The model was distributed to the community through a repository on GitHub and the Transformers library (Wolf et al. 2019) released by the development group huggingface.co. We have evaluated the validity of the model on the classification tasks of sentiment polarity, irony, subjectivity, and hate speech. The specifications of the model, the code developed for training and fine-tuning, and the instructions for using it in a research project are freely available.
format	article
author	Marco Polignano Valerio Basile Pierpaolo Basile Marco de Gemmis Giovanni Semeraro
author_facet	Marco Polignano Valerio Basile Pierpaolo Basile Marco de Gemmis Giovanni Semeraro
author_sort	Marco Polignano
title	AlBERTo: Modeling Italian Social Media Language with BERT
title_short	AlBERTo: Modeling Italian Social Media Language with BERT
title_full	AlBERTo: Modeling Italian Social Media Language with BERT
title_fullStr	AlBERTo: Modeling Italian Social Media Language with BERT
title_full_unstemmed	AlBERTo: Modeling Italian Social Media Language with BERT
title_sort	alberto: modeling italian social media language with bert
publisher	Accademia University Press
publishDate	2019
url	https://doaj.org/article/eb4a26589b8b4dbdab87d69f1f47c8d7
work_keys_str_mv	AT marcopolignano albertomodelingitaliansocialmedialanguagewithbert AT valeriobasile albertomodelingitaliansocialmedialanguagewithbert AT pierpaolobasile albertomodelingitaliansocialmedialanguagewithbert AT marcodegemmis albertomodelingitaliansocialmedialanguagewithbert AT giovannisemeraro albertomodelingitaliansocialmedialanguagewithbert
_version_	1718397979926200320

AlBERTo: Modeling Italian Social Media Language with BERT

Similar Items