TY - GEN
T1 - Stochastic Temporal Difference Learning for Sequence Data
AU - Chien, Jen Tzung
AU - Chiu, Yi Chung
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/7/18
Y1 - 2021/7/18
N2 - Planning is crucial to train an agent via model-based reinforcement learning who can predict distant observations to reflect his/her past experience. Such a planning method is theoretically and computationally attractive in comparison with traditional learning which relies on step-by-step prediction. However, it is more challenging to build a learning machine which can predict and plan randomly across multiple time steps rather than act step by step. To reflect this flexibility in learning process, we need to predict future states directly without going through all intermediate states. Accordingly, this paper develops the stochastic temporal difference learning where the sequence data are represented with multiple jumpy states while the stochastic state space model is learned by maximizing the evidence lower bound of log likelihood of training data. A general solution with various number of jumpy states is developed and formulated. Experiments demonstrate the merit of the proposed sequential machine to find predictive states to roll forward with jumps as well as predict words.
AB - Planning is crucial to train an agent via model-based reinforcement learning who can predict distant observations to reflect his/her past experience. Such a planning method is theoretically and computationally attractive in comparison with traditional learning which relies on step-by-step prediction. However, it is more challenging to build a learning machine which can predict and plan randomly across multiple time steps rather than act step by step. To reflect this flexibility in learning process, we need to predict future states directly without going through all intermediate states. Accordingly, this paper develops the stochastic temporal difference learning where the sequence data are represented with multiple jumpy states while the stochastic state space model is learned by maximizing the evidence lower bound of log likelihood of training data. A general solution with various number of jumpy states is developed and formulated. Experiments demonstrate the merit of the proposed sequential machine to find predictive states to roll forward with jumps as well as predict words.
KW - language model
KW - recurrent neural network
KW - reinforcement learning
KW - Sequential learning
KW - variational auto-encoder
UR - http://www.scopus.com/inward/record.url?scp=85116486125&partnerID=8YFLogxK
U2 - 10.1109/IJCNN52387.2021.9534155
DO - 10.1109/IJCNN52387.2021.9534155
M3 - Conference contribution
AN - SCOPUS:85116486125
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - IJCNN 2021 - International Joint Conference on Neural Networks, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 International Joint Conference on Neural Networks, IJCNN 2021
Y2 - 18 July 2021 through 22 July 2021
ER -