Conservative Contextual Combinatorial Cascading Bandit

Contextual combinatorial cascading bandit (<inline-formula> <tex-math notation="LaTeX">$C^{3}$ </tex-math></inline-formula>-bandit) is a powerful multi-armed bandit framework that balances the tradeoff between exploration and exploitation in the learning process. It...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autor principal:	Kun Wang
Formato:	article
Lenguaje:	EN
Publicado:	IEEE 2021
Materias:	Machine learning multi-armed bandit online learning recommender system Electrical engineering. Electronics. Nuclear engineering TK1-9971
Acceso en línea:	https://doaj.org/article/b2cfe5670c4048999cc7a7df5c529f40
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:b2cfe5670c4048999cc7a7df5c529f40
record_format	dspace
spelling	oai:doaj.org-article:b2cfe5670c4048999cc7a7df5c529f402021-11-17T00:00:37ZConservative Contextual Combinatorial Cascading Bandit2169-353610.1109/ACCESS.2021.3124416https://doaj.org/article/b2cfe5670c4048999cc7a7df5c529f402021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9594856/https://doaj.org/toc/2169-3536Contextual combinatorial cascading bandit (<inline-formula> <tex-math notation="LaTeX">$C^{3}$ </tex-math></inline-formula>-bandit) is a powerful multi-armed bandit framework that balances the tradeoff between exploration and exploitation in the learning process. It well captures users’ click behavior and has been applied in a broad spectrum of real-world applications such as recommender systems and search engines. However, such a framework does not provide a performance guarantee of the initial exploration phase. To that end, we propose <italic>conservative contextual combinatorial cascading bandit (<inline-formula> <tex-math notation="LaTeX">$C^{4}$ </tex-math></inline-formula>-bandit)</italic> model, aiming to address the aforementioned crucial modeling issues. In this problem, the learning agent is given some contexts and recommends a list of items not worse than the baseline strategy, and then observes the reward by some stopping rule. The objective is now to maximize the reward while simultaneously satisfying the safety constraint, i.e. guaranteeing the algorithm to perform at least as well as a baseline strategy. To tackle this new problem, we extend an online learning algorithm, called Upper Confidence Bound (UCB), to deal with a critical tradeoff between exploitation and exploration and employ the conservative mechanism to properly handle the safety constraints. By carefully integrating these two techniques, we develop a new algorithm, called <inline-formula> <tex-math notation="LaTeX">$C^{4}$ </tex-math></inline-formula>-UCB for this problem. Further, we rigorously prove the n-step upper bound in two situations: known baseline reward and unknown baseline reward. The regret in both situations is only enlarged by an additive constant term compared to results of <inline-formula> <tex-math notation="LaTeX">$C^{3}$ </tex-math></inline-formula>-bandit. Finally, experiments on synthetic and realistic datasets demonstrate its advantages.Kun WangIEEEarticleMachine learningmulti-armed banditonline learningrecommender systemElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 151434-151443 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Machine learning multi-armed bandit online learning recommender system Electrical engineering. Electronics. Nuclear engineering TK1-9971
spellingShingle	Machine learning multi-armed bandit online learning recommender system Electrical engineering. Electronics. Nuclear engineering TK1-9971 Kun Wang Conservative Contextual Combinatorial Cascading Bandit
description	Contextual combinatorial cascading bandit (<inline-formula> <tex-math notation="LaTeX">$C^{3}$ </tex-math></inline-formula>-bandit) is a powerful multi-armed bandit framework that balances the tradeoff between exploration and exploitation in the learning process. It well captures users’ click behavior and has been applied in a broad spectrum of real-world applications such as recommender systems and search engines. However, such a framework does not provide a performance guarantee of the initial exploration phase. To that end, we propose <italic>conservative contextual combinatorial cascading bandit (<inline-formula> <tex-math notation="LaTeX">$C^{4}$ </tex-math></inline-formula>-bandit)</italic> model, aiming to address the aforementioned crucial modeling issues. In this problem, the learning agent is given some contexts and recommends a list of items not worse than the baseline strategy, and then observes the reward by some stopping rule. The objective is now to maximize the reward while simultaneously satisfying the safety constraint, i.e. guaranteeing the algorithm to perform at least as well as a baseline strategy. To tackle this new problem, we extend an online learning algorithm, called Upper Confidence Bound (UCB), to deal with a critical tradeoff between exploitation and exploration and employ the conservative mechanism to properly handle the safety constraints. By carefully integrating these two techniques, we develop a new algorithm, called <inline-formula> <tex-math notation="LaTeX">$C^{4}$ </tex-math></inline-formula>-UCB for this problem. Further, we rigorously prove the n-step upper bound in two situations: known baseline reward and unknown baseline reward. The regret in both situations is only enlarged by an additive constant term compared to results of <inline-formula> <tex-math notation="LaTeX">$C^{3}$ </tex-math></inline-formula>-bandit. Finally, experiments on synthetic and realistic datasets demonstrate its advantages.
format	article
author	Kun Wang
author_facet	Kun Wang
author_sort	Kun Wang
title	Conservative Contextual Combinatorial Cascading Bandit
title_short	Conservative Contextual Combinatorial Cascading Bandit
title_full	Conservative Contextual Combinatorial Cascading Bandit
title_fullStr	Conservative Contextual Combinatorial Cascading Bandit
title_full_unstemmed	Conservative Contextual Combinatorial Cascading Bandit
title_sort	conservative contextual combinatorial cascading bandit
publisher	IEEE
publishDate	2021
url	https://doaj.org/article/b2cfe5670c4048999cc7a7df5c529f40
work_keys_str_mv	AT kunwang conservativecontextualcombinatorialcascadingbandit
_version_	1718426068942061568

Conservative Contextual Combinatorial Cascading Bandit

Ejemplares similares