TY - JOUR
T1 - Improving Generalization in Reinforcement Learning-Based Trading by Using a Generative Adversarial Market Model
AU - Kuo, Chia-Hsuan
AU - Chen, Chiao-Ting
AU - Lin, Sin-Jing
AU - Huang, Szu-Hao
PY - 2021/3
Y1 - 2021/3
N2 - With the increasing sophistication of artificial intelligence, reinforcement learning (RL) has been widely applied to portfolio management. However, shortcomings remain. Specifically, because the training environment of an RL-based portfolio optimization framework is usually constructed based on historical price data in the literature, the agent potentially 1) violates the definition of a Markov decision process (MDP), 2) ignores their own market impact, or 3) fails to account for causal relationships within interaction processes; these ultimately lead the agent to make poor generalizations. To surmount these problems-specifically, to help the RL-based portfolio agent make better generalizations-we introduce an interactive training environment that leverages a generative model, called the limit order book-generative adversarial model (LOB-GAN), to simulate a financial market. Specifically, the LOB-GAN models market ordering behavior, and LOB-GAN's generator is utilized as a market behavior simulator. A simulated financial market, called Virtual Market, is constructed by the market behavior simulator in conjunction with a realistic security matching system. Virtual Market is then leveraged as an interactive training environment for the RL-based portfolio agent. The experimental results demonstrate that our framework improves out-of-sample portfolio performance by 4%, which is superior to other generalization strategies.
AB - With the increasing sophistication of artificial intelligence, reinforcement learning (RL) has been widely applied to portfolio management. However, shortcomings remain. Specifically, because the training environment of an RL-based portfolio optimization framework is usually constructed based on historical price data in the literature, the agent potentially 1) violates the definition of a Markov decision process (MDP), 2) ignores their own market impact, or 3) fails to account for causal relationships within interaction processes; these ultimately lead the agent to make poor generalizations. To surmount these problems-specifically, to help the RL-based portfolio agent make better generalizations-we introduce an interactive training environment that leverages a generative model, called the limit order book-generative adversarial model (LOB-GAN), to simulate a financial market. Specifically, the LOB-GAN models market ordering behavior, and LOB-GAN's generator is utilized as a market behavior simulator. A simulated financial market, called Virtual Market, is constructed by the market behavior simulator in conjunction with a realistic security matching system. Virtual Market is then leveraged as an interactive training environment for the RL-based portfolio agent. The experimental results demonstrate that our framework improves out-of-sample portfolio performance by 4%, which is superior to other generalization strategies.
KW - Portfolios
KW - Training
KW - Optimization
KW - Topology
KW - Data models
KW - Stock markets
KW - Network topology
KW - Artificial market simulation
KW - portfolio management
KW - reinforcement learning
U2 - 10.1109/ACCESS.2021.3068269
DO - 10.1109/ACCESS.2021.3068269
M3 - Article
SN - 2169-3536
VL - 9
SP - 50738
EP - 50754
JO - IEEE Access
JF - IEEE Access
ER -