The aim of this paper is to estimate the six-degree-of-freedom (6-DOF) poses of objects from a single RGB image in which the target objects are partially occluded. Most recent studies have formulated methods for predicting the projected 2-D locations of 3-D keypoints through a deep neural network and then used a PnP algorithm to compute the 6-DOF poses. Several researchers have pointed out the uncertainty of the predicted locations and modelled it according to predefined rules or functions, but the performance of such approaches may still be degraded if occlusion is present. To address this problem, we formulated 2-D keypoint locations as probabilistic distributions in our novel loss function and developed a confidence-based pose estimation network. This network not only predicts the 2-D keypoint locations from each visible patch of a target object but also provides the corresponding confidence values in an unsupervised fashion. Through the proper fusion of the most reliable local predictions, the proposed method can improve the accuracy of pose estimation when target objects are partially occluded. Experiments demonstrated that our method outperforms state-of-the-art methods on a main occlusion data set used for estimating 6-D object poses. Moreover, this framework is efficient and feasible for realtime multimedia applications.
|頁（從 - 到）||3025-3035|
|期刊||IEEE Transactions on Multimedia|
|出版狀態||Published - 2022|