General-purpose computing on graphics processing unit (GPGPU) enables the possibility of parallel computing for Static Timing Analysis (STA) of VLSI designs. However, memory access and synchronization between massively many cores become challenges to parallelizing STA. In this work, we developed a fast CUDA-Accelerated STA engine (named CASTA) that incorporates four novel techniques including Table-Index Remapping (TIR), Texture-Accelerated Rendering (TAR), Cell Levelization & Type Sorting (CLTS) and Timing-Table Restructuring(TTR) to enable high parallelism. Cell Levelization & Type Sorting (CLTS) levelizes cells and sort their types in order to efficiently access the same timing library. Timing-Table Restructuring (TTR) modifies the data structure for timing signals of cells to increase memory throughput. Table-Index Remapping (TIR) re-maps the axes of timing tables to retrieve data more efficiently while Texture-Accelerated Rendering (TAR) expands look-up tables (LUTs) to avoid extrapolation and stores LUTs in the texture for speed. As a result, our experimental result indicates that CASTA successfully enables high parallelism and outperforms a commercial tool by a three-order speedup on average over several benchmark circuits.
|Number of pages||9|
|Journal||Proceedings of the International Conference on Parallel Processing|
|State||Published - 13 Nov 2014|
|Event||43rd International Conference on Parallel Processing, ICPP 2014 - Minneapolis, United States|
Duration: 9 Sep 2014 → 12 Sep 2014
- Parallel Computing