TY - JOUR
T1 - Opponent Exploitation Based on Bayesian Strategy Inference and Policy Tracking
AU - Lee, Kuei Tso
AU - Huang, Yen Yun
AU - Yang, Je Ruei
AU - Wang, Sheng Jyh
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2024/6/1
Y1 - 2024/6/1
N2 - In a multiagent competitive environment, it is important for an agent to detect the opponent's policy and adopt a suitable policy to exploit the opponent. Conventionally, most methods, e.g., Bayesian policy reuse (BPR) variants, assume that the opponent adopts a fixed policy or a randomly changing policy. In this article, we make a more realistic and reasonable assumption that the opponent may select its policy based on the previous observation. Here, we define the term 'strategy' as the mapping from the previous observation to the opponent's selected policy, and we propose the Bayesian strategy inference (BSI) framework to infer the opponent's strategy. Furthermore, to deal with opponents who may randomly select their policies, the BSI framework is combined with an intraepisode policy tracking mechanism to construct the Bayesian strategy inference plus policy tracking (BSI-PT) algorithm. In our experiments, we design an Extended Batter versus Pitcher game (EBvPG) for the evaluation of the proposed BSI-PT framework. The experimental results demonstrate that BSI-PT obtains higher policy prediction accuracy and winning percentage than three other BPR variants against the opponents with a specific policy selection strategy, with a random selection strategy, or with a partially random strategy.
AB - In a multiagent competitive environment, it is important for an agent to detect the opponent's policy and adopt a suitable policy to exploit the opponent. Conventionally, most methods, e.g., Bayesian policy reuse (BPR) variants, assume that the opponent adopts a fixed policy or a randomly changing policy. In this article, we make a more realistic and reasonable assumption that the opponent may select its policy based on the previous observation. Here, we define the term 'strategy' as the mapping from the previous observation to the opponent's selected policy, and we propose the Bayesian strategy inference (BSI) framework to infer the opponent's strategy. Furthermore, to deal with opponents who may randomly select their policies, the BSI framework is combined with an intraepisode policy tracking mechanism to construct the Bayesian strategy inference plus policy tracking (BSI-PT) algorithm. In our experiments, we design an Extended Batter versus Pitcher game (EBvPG) for the evaluation of the proposed BSI-PT framework. The experimental results demonstrate that BSI-PT obtains higher policy prediction accuracy and winning percentage than three other BPR variants against the opponents with a specific policy selection strategy, with a random selection strategy, or with a partially random strategy.
KW - Bayesian inference
KW - Bayesian policy reuse (BPR)
KW - multiagent environment
KW - policy tracking
UR - http://www.scopus.com/inward/record.url?scp=85162672203&partnerID=8YFLogxK
U2 - 10.1109/TG.2023.3285031
DO - 10.1109/TG.2023.3285031
M3 - Article
AN - SCOPUS:85162672203
SN - 2475-1502
VL - 16
SP - 419
EP - 430
JO - IEEE Transactions on Games
JF - IEEE Transactions on Games
IS - 2
ER -