Exploration through reward biasing: Reward-biased maximum likelihood estimation for stochastic multi-armed bandits

Xi Liu*, Ping Chun Hsieh, Yu Heng Hung, Anirban Bhattacharya, P. R. Kumar

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Inspired by the Reward-Biased Maximum Likelihood Estimate method of adaptive control, we propose RBMLE - a novel family of learning algorithms for stochastic multi-armed bandits (SMABs). For a broad range of SMABs including both the parametric Exponential Family as well as the non-parametric sub-Gaussian/Exponential family, we show that RBMLE yields an index policy. To choose the bias-growth rate.

Original languageEnglish
Title of host publication37th International Conference on Machine Learning, ICML 2020
EditorsHal Daume, Aarti Singh
PublisherInternational Machine Learning Society (IMLS)
Pages6204-6214
Number of pages11
ISBN (Electronic)9781713821120
StatePublished - Jul 2020
Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
Duration: 13 Jul 202018 Jul 2020

Publication series

Name37th International Conference on Machine Learning, ICML 2020
VolumePartF168147-8

Conference

Conference37th International Conference on Machine Learning, ICML 2020
CityVirtual, Online
Period13/07/2018/07/20

Fingerprint

Dive into the research topics of 'Exploration through reward biasing: Reward-biased maximum likelihood estimation for stochastic multi-armed bandits'. Together they form a unique fingerprint.

Cite this