Large scale datasets for Image and Video Captioning in Italian

The application of Attention-based Deep Neural architectures to the automatic captioning of images and videos is enabling the development of increasingly performing systems. Unfortunately, while image processing is language independent, this does not hold for caption generation. Training such archit...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Scaiella Antonio, Danilo Croce, Roberto Basili
Formato: article
Lenguaje:EN
Publicado: Accademia University Press 2019
Materias:
H
Acceso en línea:https://doaj.org/article/5273e36a79404a07978ed1fcf57fc24a
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:5273e36a79404a07978ed1fcf57fc24a
record_format dspace
spelling oai:doaj.org-article:5273e36a79404a07978ed1fcf57fc24a2021-12-02T09:52:19ZLarge scale datasets for Image and Video Captioning in Italian2499-455310.4000/ijcol.478https://doaj.org/article/5273e36a79404a07978ed1fcf57fc24a2019-12-01T00:00:00Zhttp://journals.openedition.org/ijcol/478https://doaj.org/toc/2499-4553The application of Attention-based Deep Neural architectures to the automatic captioning of images and videos is enabling the development of increasingly performing systems. Unfortunately, while image processing is language independent, this does not hold for caption generation. Training such architectures requires the availability of (possibly large-scale) language specific resources, which are not available for many languages, such as Italian.In this paper, we present MSCOCO-it e MSR-VTT-it, two large-scale resources for image and video captioning. They have been derived by applying automatic machine translation to existing resources. Even though this approach is naive and exposed to the gathering of noisy information (depending on the quality of the automatic translator), we experimentally show that robust deep learning is enabled, rather tolerant with respect to such noise. In particular, we improve the state-of-the-art results with respect to image captioning in Italian. Moreover, in the paper we discuss the training of a system that, at the best of our knowledge, is the first video captioning system in Italian.Scaiella AntonioDanilo CroceRoberto BasiliAccademia University PressarticleSocial SciencesHComputational linguistics. Natural language processingP98-98.5ENIJCoL, Vol 5, Iss 2, Pp 49-60 (2019)
institution DOAJ
collection DOAJ
language EN
topic Social Sciences
H
Computational linguistics. Natural language processing
P98-98.5
spellingShingle Social Sciences
H
Computational linguistics. Natural language processing
P98-98.5
Scaiella Antonio
Danilo Croce
Roberto Basili
Large scale datasets for Image and Video Captioning in Italian
description The application of Attention-based Deep Neural architectures to the automatic captioning of images and videos is enabling the development of increasingly performing systems. Unfortunately, while image processing is language independent, this does not hold for caption generation. Training such architectures requires the availability of (possibly large-scale) language specific resources, which are not available for many languages, such as Italian.In this paper, we present MSCOCO-it e MSR-VTT-it, two large-scale resources for image and video captioning. They have been derived by applying automatic machine translation to existing resources. Even though this approach is naive and exposed to the gathering of noisy information (depending on the quality of the automatic translator), we experimentally show that robust deep learning is enabled, rather tolerant with respect to such noise. In particular, we improve the state-of-the-art results with respect to image captioning in Italian. Moreover, in the paper we discuss the training of a system that, at the best of our knowledge, is the first video captioning system in Italian.
format article
author Scaiella Antonio
Danilo Croce
Roberto Basili
author_facet Scaiella Antonio
Danilo Croce
Roberto Basili
author_sort Scaiella Antonio
title Large scale datasets for Image and Video Captioning in Italian
title_short Large scale datasets for Image and Video Captioning in Italian
title_full Large scale datasets for Image and Video Captioning in Italian
title_fullStr Large scale datasets for Image and Video Captioning in Italian
title_full_unstemmed Large scale datasets for Image and Video Captioning in Italian
title_sort large scale datasets for image and video captioning in italian
publisher Accademia University Press
publishDate 2019
url https://doaj.org/article/5273e36a79404a07978ed1fcf57fc24a
work_keys_str_mv AT scaiellaantonio largescaledatasetsforimageandvideocaptioninginitalian
AT danilocroce largescaledatasetsforimageandvideocaptioninginitalian
AT robertobasili largescaledatasetsforimageandvideocaptioninginitalian
_version_ 1718397944592334848