Hadoop is a widely adopted distributed processing framework which assumes each computing node a CPU-based system with local memory. This design scheme cannot effectively take full advantage of an embedded heterogeneous many-core platform due the mismatch of data collection and management paradigms between the Hadoop environment and embedded heterogeneous systems. This paper proposes a Hadoop-based design of Principle Component Analysis (PCA) to efficiently leverage the distributed embedded heterogeneous many-core systems. By taking the same data layout of conventional Hadoop applications, the proposed design introduces efficient manners to collect and manage the fine-grained data chunks. The experiments on a Tegra K1 has achieved 5.9× performance enhancement.