The video coding community has long been seeking more effective rate-distortion optimization techniques than the widely adopted greedy approach. The difficulty arises when we need to predict how the coding mode decision made in one stage would affect subsequent decisions and thus the overall coding performance. Taking a data-driven approach, we introduce in this paper deep reinforcement learning (RL) as a mechanism for the coding unit (CU) split decision in HEVC/H.265. We propose to regard the luminance samples of a CU together with the quantization parameter as its state, the split decision as an action, and the reduction in ratedistortion cost relative to keeping the current CU intact as the immediate reward. Based on the Q-learning algorithm, we learn a convolutional neural network to approximate the ratedistortion cost reduction of each possible state-action pair. The proposed scheme performs compatibly with the current full rate-distortion optimization scheme in HM-16.15, incurring a 2.5% average BD-rate loss. While also performing similarly to a conventional scheme that treats the split decision as a binary classification problem, our scheme can additionally quantify the rate-distortion cost reduction, enabling more applications.