To support high-performance and low-power for multimedia applications and for hand-held devices, embedded VLIW DSP processors are of research focus. With the tight resource constraints, distributed register files, variable-length encodings for instructions, and special data paths are frequently adopted. This creates challenges to deploy software toolkits for new embedded DSP processors. This article presents our methods and experiences to develop software and toolkit flows for PAC (Parallel Architecture Core) VLIW DSP processors. Our toolkits include compilers, assemblers, debugger, and DSP micro-kernels. We first retarget Open Research Compiler (ORC) and toolkit chains for PAC VLIW DSP processor and address the issues to support distributed register files and ping-pong data paths for embedded VLIW DSP processors. Second, the linker and assmeber are able to support variable length encoding schemes for DSP instructions. In addition, the debugger and DSP micro-kernel were designed to handle dualcore environments. The footprint of micro-kernel is also around 10K to address the code-size issues for embedded devices. We also present the experimental result in the compiler framework by incorporating software pipeline (SWP) policies for distributed register files in PAC architecture. Results indicated that our compiler framework gains performance improvement around 2.5 times against the code generated without our proposed optimizations.