Embedded processors developed within the past few years have employed novel hardware designs to reduce the ever-growing complexity, power dissipation, and die area. Although using a distributed register file architecture is considered to have less read/write ports than using traditional unified register file structures, it presents challenges in compilation techniques to generate efficient codes for such architectures. This paper presents a novel scheme for register allocation that includes global and local components on a VLIW DSP processor with distributed register files whose port access is highly restricted. In the scheme, an optimization phase performed prior to conventional global/local register allocation, named global/local register file assignment (RFA), is used to minimize various register file communication costs. A heuristic algorithm is proposed for global RFA to make suitable decisions based on local RFA. Experiments were performed by incorporating our schemes on a novel VLIW DSP processor with non-uniform register files. The results indicate that the compilation based on our proposed approach delivers significant performance improvements, compared with the solution without using our proposed global register allocation scheme.