Exploration through reward biasing: Reward-biased maximum likelihood estimation for stochastic multi-armed bandits

Xi Liu*, Ping Chun Hsieh, Yu Heng Hung, Anirban Bhattacharya, P. R. Kumar

*此作品的通信作者

研究成果: Conference contribution同行評審

6 引文 斯高帕斯(Scopus)

摘要

Inspired by the Reward-Biased Maximum Likelihood Estimate method of adaptive control, we propose RBMLE - a novel family of learning algorithms for stochastic multi-armed bandits (SMABs). For a broad range of SMABs including both the parametric Exponential Family as well as the non-parametric sub-Gaussian/Exponential family, we show that RBMLE yields an index policy. To choose the bias-growth rate.

原文English
主出版物標題37th International Conference on Machine Learning, ICML 2020
編輯Hal Daume, Aarti Singh
發行者International Machine Learning Society (IMLS)
頁面6204-6214
頁數11
ISBN(電子)9781713821120
出版狀態Published - 7月 2020
事件37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
持續時間: 13 7月 202018 7月 2020

出版系列

名字37th International Conference on Machine Learning, ICML 2020
PartF168147-8

Conference

Conference37th International Conference on Machine Learning, ICML 2020
城市Virtual, Online
期間13/07/2018/07/20

指紋

深入研究「Exploration through reward biasing: Reward-biased maximum likelihood estimation for stochastic multi-armed bandits」主題。共同形成了獨特的指紋。

引用此