This paper presents a two-stage path planning architecture, realized by using a general-purpose CPU as a global planner and a proposed path planning engine as a local planner, solving a path planning problem hierarchically. By the use of heterogeneity, the hardware-software cooperation architecture can take advantages of both hardware and software approach and achieve both generalization and hardware-acceleration. The proposed path planning engine is based on A∗ algorithm, several hardware-friendly optimization techniques are applied or proposed to reduce the time complexity of searching. The coefficient modification avoids the usage of square root hardware. Then, the priority list reordering fully utilizes the memory bandwidth. Finally, we optimized the critical path and achieved 333MHz operating frequency.The proposed system is established and verified in Xilinx PYNQ-Z2 platform using ARM A9 processor as global planner and FPGA as local planner. Moreover, the proposed path planning engine is implemented in TSMC 40nm CMOS process and integrated 296.1k logic gates in 0.28mm2 chip area. It achieves 9.6μs/task latency and 0.16μJ/task energy consumption in 2D and 27.6μs/task latency and 0.48μJ/task energy consumption in 3D.