Enhancing semantics with multi‐objective reinforcement learning for video description

Abstract Video description is challenging due to the high complexity of translating visual content into language. In most popular attention‐based pipelines for this task, visual features and previously generated words are usually concatenated as a vector to predict the current word. However, the err...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Qinyu Li, Longyu Yang, Pengjie Tang, Hanli Wang
Formato: article
Lenguaje:EN
Publicado: Wiley 2021
Materias:
Acceso en línea:https://doaj.org/article/1f50686212ac4f60b204af786657d346
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:1f50686212ac4f60b204af786657d346
record_format dspace
spelling oai:doaj.org-article:1f50686212ac4f60b204af786657d3462021-12-03T08:34:31ZEnhancing semantics with multi‐objective reinforcement learning for video description1350-911X0013-519410.1049/ell2.12334https://doaj.org/article/1f50686212ac4f60b204af786657d3462021-12-01T00:00:00Zhttps://doi.org/10.1049/ell2.12334https://doaj.org/toc/0013-5194https://doaj.org/toc/1350-911XAbstract Video description is challenging due to the high complexity of translating visual content into language. In most popular attention‐based pipelines for this task, visual features and previously generated words are usually concatenated as a vector to predict the current word. However, the errors caused by the inaccuracy of the predicted words may be accumulated, and the gap between visual features and language features may bring noises into the description model. Facing these problems, a variant of recurrent neural network is designed in this work, and a novel framework is developed to enhance the visual clues for video description. Moreover, a multi‐objective reinforcement learning strategy is implemented to build a more comprehensive reward with multiple metrics to improve the consistency and semantics of the generated description sentence. The experiments on the benchmark MSR‐VTT2016 and MSVD datasets demonstrate the effectiveness of the proposed approach.Qinyu LiLongyu YangPengjie TangHanli WangWileyarticleElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENElectronics Letters, Vol 57, Iss 25, Pp 977-979 (2021)
institution DOAJ
collection DOAJ
language EN
topic Electrical engineering. Electronics. Nuclear engineering
TK1-9971
spellingShingle Electrical engineering. Electronics. Nuclear engineering
TK1-9971
Qinyu Li
Longyu Yang
Pengjie Tang
Hanli Wang
Enhancing semantics with multi‐objective reinforcement learning for video description
description Abstract Video description is challenging due to the high complexity of translating visual content into language. In most popular attention‐based pipelines for this task, visual features and previously generated words are usually concatenated as a vector to predict the current word. However, the errors caused by the inaccuracy of the predicted words may be accumulated, and the gap between visual features and language features may bring noises into the description model. Facing these problems, a variant of recurrent neural network is designed in this work, and a novel framework is developed to enhance the visual clues for video description. Moreover, a multi‐objective reinforcement learning strategy is implemented to build a more comprehensive reward with multiple metrics to improve the consistency and semantics of the generated description sentence. The experiments on the benchmark MSR‐VTT2016 and MSVD datasets demonstrate the effectiveness of the proposed approach.
format article
author Qinyu Li
Longyu Yang
Pengjie Tang
Hanli Wang
author_facet Qinyu Li
Longyu Yang
Pengjie Tang
Hanli Wang
author_sort Qinyu Li
title Enhancing semantics with multi‐objective reinforcement learning for video description
title_short Enhancing semantics with multi‐objective reinforcement learning for video description
title_full Enhancing semantics with multi‐objective reinforcement learning for video description
title_fullStr Enhancing semantics with multi‐objective reinforcement learning for video description
title_full_unstemmed Enhancing semantics with multi‐objective reinforcement learning for video description
title_sort enhancing semantics with multi‐objective reinforcement learning for video description
publisher Wiley
publishDate 2021
url https://doaj.org/article/1f50686212ac4f60b204af786657d346
work_keys_str_mv AT qinyuli enhancingsemanticswithmultiobjectivereinforcementlearningforvideodescription
AT longyuyang enhancingsemanticswithmultiobjectivereinforcementlearningforvideodescription
AT pengjietang enhancingsemanticswithmultiobjectivereinforcementlearningforvideodescription
AT hanliwang enhancingsemanticswithmultiobjectivereinforcementlearningforvideodescription
_version_ 1718373373694705664