This paper presents a router, which tackles a classic algorithm problem in EDA, obstacle-avoiding rectilinear Steiner minimum tree (OARSMT), with the help of an agent trained by our proposed policy-based reinforcement-learning (RL) framework. The job of the policy agent is to select an optimal set of Steiner points that can lead to an optimal OARSMT based on a given layout. Our RL framework can iteratively upgrade the policy agent by applying Monte-Carlo tree search to explore and evaluate various choices of Steiner points on various unseen layouts. As a result, our policy agent can be viewed as a self-designed OARSMT algorithm that can iteratively evolves by itself. The initial version of the agent is a sequential one, which selects one Steiner point at a time. Based on the sequential agent, a concurrent agent can then be derived to predict all required Steiner points with only one model inference. The overall training time can be further reduced by applying geometrically symmetric samples for training. The experimental results on single-layer 15x15 and 30x30 layouts demonstrate that our trained concurrent agent can outperform a state-of-the-art OARSMT router on both wire length and runtime.