In this paper, we proposed a multi-view system for 3D human skeleton tracking based on multi-cue fusion. Multiple Kinect version 2 cameras are applied to build up a low-cost system. Though Kinect cameras can detect 3D skeleton from their depth sensors, some challenges of skeleton extraction still exist, such as left–right confusion and severe self-occlusion. Moreover, human skeleton tracking systems often have difficulty in dealing with lost tracking. These challenges make robust 3D skeleton tracking nontrivial. To address these challenges in a unified framework, we first correct the skeleton's left–right ambiguity by referring to the human joints extracted by OpenPose. Unlike Kinect, and OpenPose extracts target joints by learning-based image analysis to differentiate a person's front side and backside. With help from 2D images, we can correct the left–right skeleton confusion. On the other hand, we find that self-occlusion severely degrades Kinect joint detection owing to incorrect joint depth estimation. To alleviate the problem, we reconstruct a reference 3D skeleton by back-projecting the corresponding 2D OpenPose joints from multiple cameras. The reconstructed joints are less sensitive to occlusion and can be served as 3D anchors for skeleton fusion. Finally, we introduce inter-joint constraints into our probabilistic skeleton tracking framework to trace all joints simultaneously. Unlike conventional methods that treat each joint individually, neighboring joints are utilized to position each other. In this way, when joints are missing due to occlusion, the inter-joint constraints can ensure the skeleton consistency and preserve the length between neighboring joints. In the end, we evaluate our method with five challenging actions by building a real-time demo system. It shows that the system can track skeletons stably without error propagation and vibration. The experimental results also reveal that the average localization error is smaller than that of conventional methods.