CASTA: CUDA-accelerated static timing analysis for VLSI designs

Hunta H.W. Wang, Louis Y.Z. Lin, Ryan H.M. Huang, Charles H.P. Wen

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations


General-purpose computing on graphics processing unit (GPGPU) enables the possibility of parallel computing for Static Timing Analysis (STA) of VLSI designs. However, memory access and synchronization between massively many cores become challenges to parallelizing STA. In this work, we developed a fast CUDA-Accelerated STA engine (named CASTA) that incorporates four novel techniques including Table-Index Remapping (TIR), Texture-Accelerated Rendering (TAR), Cell Levelization & Type Sorting (CLTS) and Timing-Table Restructuring(TTR) to enable high parallelism. Cell Levelization & Type Sorting (CLTS) levelizes cells and sort their types in order to efficiently access the same timing library. Timing-Table Restructuring (TTR) modifies the data structure for timing signals of cells to increase memory throughput. Table-Index Remapping (TIR) re-maps the axes of timing tables to retrieve data more efficiently while Texture-Accelerated Rendering (TAR) expands look-up tables (LUTs) to avoid extrapolation and stores LUTs in the texture for speed. As a result, our experimental result indicates that CASTA successfully enables high parallelism and outperforms a commercial tool by a three-order speedup on average over several benchmark circuits.

Original languageEnglish
Article number6957228
Pages (from-to)192-200
Number of pages9
JournalProceedings of the International Conference on Parallel Processing
Issue numberNovember
StatePublished - 13 Nov 2014
Event43rd International Conference on Parallel Processing, ICPP 2014 - Minneapolis, United States
Duration: 9 Sep 201412 Sep 2014


  • CUDA
  • GPU
  • Parallel Computing
  • STA


Dive into the research topics of 'CASTA: CUDA-accelerated static timing analysis for VLSI designs'. Together they form a unique fingerprint.

Cite this