TY - GEN
T1 - Bayesian Opponent Exploitation by Inferring the Opponent's Policy Selection Pattern
AU - Lee, Kuei Tso
AU - Wang, Sheng Jyh
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In a multi-agent competitive domain, the agent needs to anticipate the opponent's behavior and select a suitable policy to exploit the opponent. In this work, based on the BPR (Bayesian Policy Reuse) framework, we further assume the opponent may determine its policy depending on its previous observation. To deal with opponents of this kind, we discuss three different approaches for the agent, including learning from scratch, reasoning from experience, and reasoning accompanied by learning. The 'reasoning accompanied by learning' approach turns out to be the most favorable method, in which the agent executes an iterative process that alternates between 'updating the belief of each pre-collected model' and 'progressively learning the opponent's policy selection pattern' based on the observed data. In our experiments, we simulate a simplified batter vs. pitcher game. The experimental results show that the 'reasoning accompanied by learning' approach does receive a larger averaged utility value than the learn-from-scratch approach and the reason-from-experience approach.
AB - In a multi-agent competitive domain, the agent needs to anticipate the opponent's behavior and select a suitable policy to exploit the opponent. In this work, based on the BPR (Bayesian Policy Reuse) framework, we further assume the opponent may determine its policy depending on its previous observation. To deal with opponents of this kind, we discuss three different approaches for the agent, including learning from scratch, reasoning from experience, and reasoning accompanied by learning. The 'reasoning accompanied by learning' approach turns out to be the most favorable method, in which the agent executes an iterative process that alternates between 'updating the belief of each pre-collected model' and 'progressively learning the opponent's policy selection pattern' based on the observed data. In our experiments, we simulate a simplified batter vs. pitcher game. The experimental results show that the 'reasoning accompanied by learning' approach does receive a larger averaged utility value than the learn-from-scratch approach and the reason-from-experience approach.
KW - Bayes rule
KW - Bayesian Policy Reuse
KW - non-stationary opponent
KW - opponent exploitation
KW - opponent modeling
UR - http://www.scopus.com/inward/record.url?scp=85139142804&partnerID=8YFLogxK
U2 - 10.1109/CoG51982.2022.9893611
DO - 10.1109/CoG51982.2022.9893611
M3 - Conference contribution
AN - SCOPUS:85139142804
T3 - IEEE Conference on Computatonal Intelligence and Games, CIG
SP - 151
EP - 158
BT - 2022 IEEE Conference on Games, CoG 2022
PB - IEEE Computer Society
T2 - 2022 IEEE Conference on Games, CoG 2022
Y2 - 21 August 2022 through 24 August 2022
ER -