Bayesian Opponent Exploitation by Inferring the Opponent's Policy Selection Pattern

Kuei Tso Lee, Sheng Jyh Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In a multi-agent competitive domain, the agent needs to anticipate the opponent's behavior and select a suitable policy to exploit the opponent. In this work, based on the BPR (Bayesian Policy Reuse) framework, we further assume the opponent may determine its policy depending on its previous observation. To deal with opponents of this kind, we discuss three different approaches for the agent, including learning from scratch, reasoning from experience, and reasoning accompanied by learning. The 'reasoning accompanied by learning' approach turns out to be the most favorable method, in which the agent executes an iterative process that alternates between 'updating the belief of each pre-collected model' and 'progressively learning the opponent's policy selection pattern' based on the observed data. In our experiments, we simulate a simplified batter vs. pitcher game. The experimental results show that the 'reasoning accompanied by learning' approach does receive a larger averaged utility value than the learn-from-scratch approach and the reason-from-experience approach.

Original languageEnglish
Title of host publication2022 IEEE Conference on Games, CoG 2022
PublisherIEEE Computer Society
Pages151-158
Number of pages8
ISBN (Electronic)9781665459891
DOIs
StatePublished - 2022
Event2022 IEEE Conference on Games, CoG 2022 - Beijing, China
Duration: 21 Aug 202224 Aug 2022

Publication series

NameIEEE Conference on Computatonal Intelligence and Games, CIG
Volume2022-August
ISSN (Print)2325-4270
ISSN (Electronic)2325-4289

Conference

Conference2022 IEEE Conference on Games, CoG 2022
Country/TerritoryChina
CityBeijing
Period21/08/2224/08/22

Keywords

  • Bayes rule
  • Bayesian Policy Reuse
  • non-stationary opponent
  • opponent exploitation
  • opponent modeling

Fingerprint

Dive into the research topics of 'Bayesian Opponent Exploitation by Inferring the Opponent's Policy Selection Pattern'. Together they form a unique fingerprint.

Cite this