Rapid growth of the sheer amount of genome data and intense computation become great challenges for downstream genome analytics. Efficient parallel processing and distributed computing are the two effective schemes to address the analysis of big data. Range join is a widely used, effective, yet time-consuming operation that finds the overlap between two different sets of genome features. The current widely adopted BEDTools  pipeline adopts single-node binary tree approach, while the distributed GenAp scheme fails to exploit the massive parallel computation on modern throughput processors, such as GPU (Graphic Processing Unit). This paper proposes a novel Distributed Parallel P-ary search (DP2) that applies novel P-ary analysis to enable high parallelism at algorithmic level, and extensively utilize multiple GPUs at system and architecture level. Efficient computation allocation is implemented to leverage the distributed computing on clusters. The proposed framework can be well integrated with current BEDTools  pipeline, and achieves an average of 25x speedup for the actual range-join operation when compared with Binary tree approach of GenAp and a 13x end-to-end (total execution time) speedup in comparison to ADAM.