Cross‐modal semantic correlation learning by Bi‐CNN network

Abstract Cross modal retrieval can retrieve images through a text query and vice versa. In recent years, cross modal retrieval has attracted extensive attention. The purpose of most now available cross modal retrieval methods is to find a common subspace and maximize the different modal correlation....

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Chaoyi Wang, Liang Li, Chenggang Yan, Zhan Wang, Yaoqi Sun, Jiyong Zhang
Formato:	article
Lenguaje:	EN
Publicado:	Wiley 2021
Materias:	Photography TR1-1050 Computer software QA76.75-76.765
Acceso en línea:	https://doaj.org/article/63c3d119128048deb09c7f15c31adc7e
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:63c3d119128048deb09c7f15c31adc7e
record_format	dspace
spelling	oai:doaj.org-article:63c3d119128048deb09c7f15c31adc7e2021-11-29T03:38:16ZCross‐modal semantic correlation learning by Bi‐CNN network1751-96671751-965910.1049/ipr2.12176https://doaj.org/article/63c3d119128048deb09c7f15c31adc7e2021-12-01T00:00:00Zhttps://doi.org/10.1049/ipr2.12176https://doaj.org/toc/1751-9659https://doaj.org/toc/1751-9667Abstract Cross modal retrieval can retrieve images through a text query and vice versa. In recent years, cross modal retrieval has attracted extensive attention. The purpose of most now available cross modal retrieval methods is to find a common subspace and maximize the different modal correlation. To generate specific representations consistent with cross modal tasks, this paper proposes a novel cross modal retrieval framework, which integrates feature learning and latent space embedding. In detail, we proposed a deep CNN and a shallow CNN to extract the feature of the samples. The deep CNN is used to extract the representation of images, and the shallow CNN uses a multi‐dimensional kernel to extract multi‐level semantic representation of text. Meanwhile, we enhance the semantic manifold by constructing cross modal ranking and within‐modal discriminant loss to improve the division of semantic representation. Moreover, the most representative samples are selected by using online sampling strategy, so that the approach can be implemented on a large‐scale data. This approach not only increases the discriminative ability among different categories, but also maximizes the relativity between different modalities. Experiments on three real word datasets show that the proposed method is superior to the popular methods.Chaoyi WangLiang LiChenggang YanZhan WangYaoqi SunJiyong ZhangWileyarticlePhotographyTR1-1050Computer softwareQA76.75-76.765ENIET Image Processing, Vol 15, Iss 14, Pp 3674-3684 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Photography TR1-1050 Computer software QA76.75-76.765
spellingShingle	Photography TR1-1050 Computer software QA76.75-76.765 Chaoyi Wang Liang Li Chenggang Yan Zhan Wang Yaoqi Sun Jiyong Zhang Cross‐modal semantic correlation learning by Bi‐CNN network
description	Abstract Cross modal retrieval can retrieve images through a text query and vice versa. In recent years, cross modal retrieval has attracted extensive attention. The purpose of most now available cross modal retrieval methods is to find a common subspace and maximize the different modal correlation. To generate specific representations consistent with cross modal tasks, this paper proposes a novel cross modal retrieval framework, which integrates feature learning and latent space embedding. In detail, we proposed a deep CNN and a shallow CNN to extract the feature of the samples. The deep CNN is used to extract the representation of images, and the shallow CNN uses a multi‐dimensional kernel to extract multi‐level semantic representation of text. Meanwhile, we enhance the semantic manifold by constructing cross modal ranking and within‐modal discriminant loss to improve the division of semantic representation. Moreover, the most representative samples are selected by using online sampling strategy, so that the approach can be implemented on a large‐scale data. This approach not only increases the discriminative ability among different categories, but also maximizes the relativity between different modalities. Experiments on three real word datasets show that the proposed method is superior to the popular methods.
format	article
author	Chaoyi Wang Liang Li Chenggang Yan Zhan Wang Yaoqi Sun Jiyong Zhang
author_facet	Chaoyi Wang Liang Li Chenggang Yan Zhan Wang Yaoqi Sun Jiyong Zhang
author_sort	Chaoyi Wang
title	Cross‐modal semantic correlation learning by Bi‐CNN network
title_short	Cross‐modal semantic correlation learning by Bi‐CNN network
title_full	Cross‐modal semantic correlation learning by Bi‐CNN network
title_fullStr	Cross‐modal semantic correlation learning by Bi‐CNN network
title_full_unstemmed	Cross‐modal semantic correlation learning by Bi‐CNN network
title_sort	cross‐modal semantic correlation learning by bi‐cnn network
publisher	Wiley
publishDate	2021
url	https://doaj.org/article/63c3d119128048deb09c7f15c31adc7e
work_keys_str_mv	AT chaoyiwang crossmodalsemanticcorrelationlearningbybicnnnetwork AT liangli crossmodalsemanticcorrelationlearningbybicnnnetwork AT chenggangyan crossmodalsemanticcorrelationlearningbybicnnnetwork AT zhanwang crossmodalsemanticcorrelationlearningbybicnnnetwork AT yaoqisun crossmodalsemanticcorrelationlearningbybicnnnetwork AT jiyongzhang crossmodalsemanticcorrelationlearningbybicnnnetwork
_version_	1718407648898973696

Cross‐modal semantic correlation learning by Bi‐CNN network

Ejemplares similares