Towards Automatic Subtitling: Assessing the Quality of Old and New Resources

Growing needs in localising multimedia content for global audiences have resulted in Neural Machine Translation (NMT) gradually becoming an established practice in the field of subtitling in order to reduce costs and turn-around times. Contrary to text translation, subtitling is subject to spatial a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Alina Karakanta, Matteo Negri, Marco Turchi
Formato: article
Lenguaje:EN
Publicado: Accademia University Press 2020
Materias:
H
Acceso en línea:https://doaj.org/article/133e90f0aee140eeb5a36011b52f3917
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:133e90f0aee140eeb5a36011b52f3917
record_format dspace
spelling oai:doaj.org-article:133e90f0aee140eeb5a36011b52f39172021-12-02T09:52:20ZTowards Automatic Subtitling: Assessing the Quality of Old and New Resources2499-455310.4000/ijcol.649https://doaj.org/article/133e90f0aee140eeb5a36011b52f39172020-06-01T00:00:00Zhttp://journals.openedition.org/ijcol/649https://doaj.org/toc/2499-4553Growing needs in localising multimedia content for global audiences have resulted in Neural Machine Translation (NMT) gradually becoming an established practice in the field of subtitling in order to reduce costs and turn-around times. Contrary to text translation, subtitling is subject to spatial and temporal constraints, which greatly increase the post-processing effort required to restore the NMT output to a proper subtitle format. In our previous work (Karakanta, Negri, and Turchi 2019), we identified several missing elements in the corpora available for training NMT systems specifically tailored for subtitling. In this work, we compare the previously studied corpora with MuST-Cinema, a corpus enabling end-to-end speech to subtitles translation, in terms of the conformity to the constraints of: 1) length and reading speed; and 2) proper line breaks. We show that MuST-Cinema conforms to these constraints and discuss the recent progress the corpus has facilitated in end-to-end speech to subtitles translation.Alina KarakantaMatteo NegriMarco TurchiAccademia University PressarticleSocial SciencesHComputational linguistics. Natural language processingP98-98.5ENIJCoL, Vol 6, Iss 1, Pp 63-76 (2020)
institution DOAJ
collection DOAJ
language EN
topic Social Sciences
H
Computational linguistics. Natural language processing
P98-98.5
spellingShingle Social Sciences
H
Computational linguistics. Natural language processing
P98-98.5
Alina Karakanta
Matteo Negri
Marco Turchi
Towards Automatic Subtitling: Assessing the Quality of Old and New Resources
description Growing needs in localising multimedia content for global audiences have resulted in Neural Machine Translation (NMT) gradually becoming an established practice in the field of subtitling in order to reduce costs and turn-around times. Contrary to text translation, subtitling is subject to spatial and temporal constraints, which greatly increase the post-processing effort required to restore the NMT output to a proper subtitle format. In our previous work (Karakanta, Negri, and Turchi 2019), we identified several missing elements in the corpora available for training NMT systems specifically tailored for subtitling. In this work, we compare the previously studied corpora with MuST-Cinema, a corpus enabling end-to-end speech to subtitles translation, in terms of the conformity to the constraints of: 1) length and reading speed; and 2) proper line breaks. We show that MuST-Cinema conforms to these constraints and discuss the recent progress the corpus has facilitated in end-to-end speech to subtitles translation.
format article
author Alina Karakanta
Matteo Negri
Marco Turchi
author_facet Alina Karakanta
Matteo Negri
Marco Turchi
author_sort Alina Karakanta
title Towards Automatic Subtitling: Assessing the Quality of Old and New Resources
title_short Towards Automatic Subtitling: Assessing the Quality of Old and New Resources
title_full Towards Automatic Subtitling: Assessing the Quality of Old and New Resources
title_fullStr Towards Automatic Subtitling: Assessing the Quality of Old and New Resources
title_full_unstemmed Towards Automatic Subtitling: Assessing the Quality of Old and New Resources
title_sort towards automatic subtitling: assessing the quality of old and new resources
publisher Accademia University Press
publishDate 2020
url https://doaj.org/article/133e90f0aee140eeb5a36011b52f3917
work_keys_str_mv AT alinakarakanta towardsautomaticsubtitlingassessingthequalityofoldandnewresources
AT matteonegri towardsautomaticsubtitlingassessingthequalityofoldandnewresources
AT marcoturchi towardsautomaticsubtitlingassessingthequalityofoldandnewresources
_version_ 1718397929100673024