In this paper, we propose a deep learning system to localize and recognize Chinese texts in scenes with signage and road marks through 3D convolutional neural network. The proposed system adopts YOLO for detecting target location and exploits 3D convolutional neural network for recognizing the contents. The proposed design outperforms the existing designs based on LSTM and achieves real-time processing performance, which is feasible to be implemented on embedded platforms. The proposed system reaches over 90% accuracy in recognizing Chinese texts on bird's-eye viewing road marks in a self-driving vehicle equipped with a fisheye camera. In addition, this system can achieve 20 fps execution speed with NVIDIA DIGITS DevBox with 1080Ti GPU, which is fast enough for autonomous driving applications.