TY - GEN
T1 - Collaborative Partially-Observable Reinforcement Learning Using Wireless Communications
AU - Ko, Eisaku
AU - Chen, Kwang Cheng
AU - Lien, Shao Yu
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - Each robot utilizes the reinforcement learning (RL) to control its maneuver and these robots can collaborate to accomplish a common goal to form a collaborative multi-agent system (MAS). Due to the constraints of distributive locations and different poses of robots, in practice, each agent (robot) in such a collaborative MAS can only partially observe the environment and other agents (such as competitive agents), and consequently operate based on its belief of the state(s). The alignment of the beliefs of collaborative agents can be therefore enhanced by adopting wireless communications, but is rarely studied in literature. To explore wireless communications applied to collaborative partially-observable reinforcement learning (PORL), we propose that each collaborative agent predicts the environment dynamics, including the behavior of those agents outside the collaborative MAS, and then constructs the learning-based belief of the world (i.e. global state). To assist such prediction and learning, we modify the RL assisted by the wireless communication functionality into two stages: prediction of the state and local actor-and-critic on global value(s). In other words, while one agent predicts and learns its own policy, another agent can updates critics on the sequence of history to update global value(s) that can further assist to validate the prediction. From numerical experiments, we find that the timing of communication or information exchange among collaborative agents has critical impact on the duration of learning and prediction, and thus the performance of MAS, which suggests the desirable communication for distributed PORL among collaborative agents toward an efficient MAS.
AB - Each robot utilizes the reinforcement learning (RL) to control its maneuver and these robots can collaborate to accomplish a common goal to form a collaborative multi-agent system (MAS). Due to the constraints of distributive locations and different poses of robots, in practice, each agent (robot) in such a collaborative MAS can only partially observe the environment and other agents (such as competitive agents), and consequently operate based on its belief of the state(s). The alignment of the beliefs of collaborative agents can be therefore enhanced by adopting wireless communications, but is rarely studied in literature. To explore wireless communications applied to collaborative partially-observable reinforcement learning (PORL), we propose that each collaborative agent predicts the environment dynamics, including the behavior of those agents outside the collaborative MAS, and then constructs the learning-based belief of the world (i.e. global state). To assist such prediction and learning, we modify the RL assisted by the wireless communication functionality into two stages: prediction of the state and local actor-and-critic on global value(s). In other words, while one agent predicts and learns its own policy, another agent can updates critics on the sequence of history to update global value(s) that can further assist to validate the prediction. From numerical experiments, we find that the timing of communication or information exchange among collaborative agents has critical impact on the duration of learning and prediction, and thus the performance of MAS, which suggests the desirable communication for distributed PORL among collaborative agents toward an efficient MAS.
KW - artificial intelligence
KW - collaborative robots
KW - Hidden Markov Model
KW - machine learning
KW - MAS
KW - RL
KW - wireless communications
UR - http://www.scopus.com/inward/record.url?scp=85115695688&partnerID=8YFLogxK
U2 - 10.1109/ICC42927.2021.9500922
DO - 10.1109/ICC42927.2021.9500922
M3 - Conference contribution
AN - SCOPUS:85115695688
T3 - IEEE International Conference on Communications
BT - ICC 2021 - IEEE International Conference on Communications, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Communications, ICC 2021
Y2 - 14 June 2021 through 23 June 2021
ER -