Exploration through reward biasing: Reward-biased maximum likelihood estimation for stochastic multi-armed bandits

Xi Liu*, Ping Chun Hsieh, Yu Heng Hung, Anirban Bhattacharya, P. R. Kumar

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Inspired by the Reward-Biased Maximum Likelihood Estimate method of adaptive control, we propose RBMLE - a novel family of learning algorithms for stochastic multi-armed bandits (SMABs). For a broad range of SMABs including both the parametric Exponential Family as well as the non-parametric sub-Gaussian/Exponential family, we show that RBMLE yields an index policy. To choose the bias-growth rate.

    Original languageEnglish
    Title of host publication37th International Conference on Machine Learning, ICML 2020
    EditorsHal Daume, Aarti Singh
    PublisherInternational Machine Learning Society (IMLS)
    Pages6204-6214
    Number of pages11
    ISBN (Electronic)9781713821120
    StatePublished - 2020
    Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
    Duration: 13 Jul 202018 Jul 2020

    Publication series

    Name37th International Conference on Machine Learning, ICML 2020
    VolumePartF168147-8

    Conference

    Conference37th International Conference on Machine Learning, ICML 2020
    CityVirtual, Online
    Period13/07/2018/07/20

    Fingerprint

    Dive into the research topics of 'Exploration through reward biasing: Reward-biased maximum likelihood estimation for stochastic multi-armed bandits'. Together they form a unique fingerprint.

    Cite this