Bayesian Opponent Exploitation by Inferring the Opponent's Policy Selection Pattern

Kuei Tso Lee, Sheng Jyh Wang

研究成果: Conference contribution同行評審

2 引文 斯高帕斯(Scopus)

摘要

In a multi-agent competitive domain, the agent needs to anticipate the opponent's behavior and select a suitable policy to exploit the opponent. In this work, based on the BPR (Bayesian Policy Reuse) framework, we further assume the opponent may determine its policy depending on its previous observation. To deal with opponents of this kind, we discuss three different approaches for the agent, including learning from scratch, reasoning from experience, and reasoning accompanied by learning. The 'reasoning accompanied by learning' approach turns out to be the most favorable method, in which the agent executes an iterative process that alternates between 'updating the belief of each pre-collected model' and 'progressively learning the opponent's policy selection pattern' based on the observed data. In our experiments, we simulate a simplified batter vs. pitcher game. The experimental results show that the 'reasoning accompanied by learning' approach does receive a larger averaged utility value than the learn-from-scratch approach and the reason-from-experience approach.

原文English
主出版物標題2022 IEEE Conference on Games, CoG 2022
發行者IEEE Computer Society
頁面151-158
頁數8
ISBN(電子)9781665459891
DOIs
出版狀態Published - 2022
事件2022 IEEE Conference on Games, CoG 2022 - Beijing, 中國
持續時間: 21 8月 202224 8月 2022

出版系列

名字IEEE Conference on Computatonal Intelligence and Games, CIG
2022-August
ISSN(列印)2325-4270
ISSN(電子)2325-4289

Conference

Conference2022 IEEE Conference on Games, CoG 2022
國家/地區中國
城市Beijing
期間21/08/2224/08/22

指紋

深入研究「Bayesian Opponent Exploitation by Inferring the Opponent's Policy Selection Pattern」主題。共同形成了獨特的指紋。

引用此