Enhancing semantics with multi‐objective reinforcement learning for video description
Abstract Video description is challenging due to the high complexity of translating visual content into language. In most popular attention‐based pipelines for this task, visual features and previously generated words are usually concatenated as a vector to predict the current word. However, the err...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Wiley
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/1f50686212ac4f60b204af786657d346 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:1f50686212ac4f60b204af786657d346 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:1f50686212ac4f60b204af786657d3462021-12-03T08:34:31ZEnhancing semantics with multi‐objective reinforcement learning for video description1350-911X0013-519410.1049/ell2.12334https://doaj.org/article/1f50686212ac4f60b204af786657d3462021-12-01T00:00:00Zhttps://doi.org/10.1049/ell2.12334https://doaj.org/toc/0013-5194https://doaj.org/toc/1350-911XAbstract Video description is challenging due to the high complexity of translating visual content into language. In most popular attention‐based pipelines for this task, visual features and previously generated words are usually concatenated as a vector to predict the current word. However, the errors caused by the inaccuracy of the predicted words may be accumulated, and the gap between visual features and language features may bring noises into the description model. Facing these problems, a variant of recurrent neural network is designed in this work, and a novel framework is developed to enhance the visual clues for video description. Moreover, a multi‐objective reinforcement learning strategy is implemented to build a more comprehensive reward with multiple metrics to improve the consistency and semantics of the generated description sentence. The experiments on the benchmark MSR‐VTT2016 and MSVD datasets demonstrate the effectiveness of the proposed approach.Qinyu LiLongyu YangPengjie TangHanli WangWileyarticleElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENElectronics Letters, Vol 57, Iss 25, Pp 977-979 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Electrical engineering. Electronics. Nuclear engineering TK1-9971 |
spellingShingle |
Electrical engineering. Electronics. Nuclear engineering TK1-9971 Qinyu Li Longyu Yang Pengjie Tang Hanli Wang Enhancing semantics with multi‐objective reinforcement learning for video description |
description |
Abstract Video description is challenging due to the high complexity of translating visual content into language. In most popular attention‐based pipelines for this task, visual features and previously generated words are usually concatenated as a vector to predict the current word. However, the errors caused by the inaccuracy of the predicted words may be accumulated, and the gap between visual features and language features may bring noises into the description model. Facing these problems, a variant of recurrent neural network is designed in this work, and a novel framework is developed to enhance the visual clues for video description. Moreover, a multi‐objective reinforcement learning strategy is implemented to build a more comprehensive reward with multiple metrics to improve the consistency and semantics of the generated description sentence. The experiments on the benchmark MSR‐VTT2016 and MSVD datasets demonstrate the effectiveness of the proposed approach. |
format |
article |
author |
Qinyu Li Longyu Yang Pengjie Tang Hanli Wang |
author_facet |
Qinyu Li Longyu Yang Pengjie Tang Hanli Wang |
author_sort |
Qinyu Li |
title |
Enhancing semantics with multi‐objective reinforcement learning for video description |
title_short |
Enhancing semantics with multi‐objective reinforcement learning for video description |
title_full |
Enhancing semantics with multi‐objective reinforcement learning for video description |
title_fullStr |
Enhancing semantics with multi‐objective reinforcement learning for video description |
title_full_unstemmed |
Enhancing semantics with multi‐objective reinforcement learning for video description |
title_sort |
enhancing semantics with multi‐objective reinforcement learning for video description |
publisher |
Wiley |
publishDate |
2021 |
url |
https://doaj.org/article/1f50686212ac4f60b204af786657d346 |
work_keys_str_mv |
AT qinyuli enhancingsemanticswithmultiobjectivereinforcementlearningforvideodescription AT longyuyang enhancingsemanticswithmultiobjectivereinforcementlearningforvideodescription AT pengjietang enhancingsemanticswithmultiobjectivereinforcementlearningforvideodescription AT hanliwang enhancingsemanticswithmultiobjectivereinforcementlearningforvideodescription |
_version_ |
1718373373694705664 |