Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Yu Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems. We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with the state-of-the-art benchmark methods in extensive experiments. The new policies achieve this with low computation time per pull for linear bandits, and thereby resulting in both favorable regret as well as computational efficiency.
Original languageEnglish
Title of host publicationProceedings of AAAI Conference on Artificial Intelligence
Pages7874-7882
Number of pages9
Volume3
Edition9
DOIs
StatePublished - Feb 2021

Fingerprint

Dive into the research topics of 'Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits'. Together they form a unique fingerprint.

Cite this