TY - JOUR
T1 - CIFER
T2 - A Cache-Coherent 12-nm 16-mm2SoC With Four 64-Bit RISC-V Application Cores, 18 32-Bit RISC-V Compute Cores, and a 1541 LUT6/mm2Synthesizable eFPGA
AU - Li, Ang
AU - Chang, Ting Jung
AU - Gao, Fei
AU - Ta, Tuan
AU - Tziantzioulis, Georgios
AU - Ou, Yanghui
AU - Wang, Moyang
AU - Tu, Jinzheng
AU - Xu, Kaifeng
AU - Jackson, Paul
AU - Ning, August
AU - Chirkov, Grigory
AU - Orenes-Vera, Marcelo
AU - Agwa, Shady
AU - Yan, Xiaoyu
AU - Tang, Eric
AU - Balkind, Jonathan
AU - Batten, Christopher
AU - Wentzlaff, David
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2023
Y1 - 2023
N2 - This letter presents CIFER, the world's first open-source, fully cache-coherent, heterogeneous many-core, CPU-FPGA system-on-chips. The 12 nm, 16-mm2 chip integrates four 64-bit, OS-capable, RISC-V application cores; three TinyCore clusters that each contain six 32-bit, RISC-V compute cores (18 in total); and an electronic design automation-synthesized, standard-cell-based eFPGA. CIFER enables the decomposition of real-world applications and tailored execution (parallelization or specialization) per decomposed task. Our evaluation shows that: 1) the TinyCore clusters increase the throughput and energy efficiency of data- and thread-parallel tasks by up to 7.95× and 7.75× over one 64-bit core, respectively; 2) the eFPGA increases the throughput and energy efficiency of hardware-accelerable tasks by up to 9.29× and 10.62× , respectively; and 3) using coherent caches for data transfer between the processors and the eFPGA increases the throughput and energy efficiency by up to 11.1× and 10.5× , respectively.
AB - This letter presents CIFER, the world's first open-source, fully cache-coherent, heterogeneous many-core, CPU-FPGA system-on-chips. The 12 nm, 16-mm2 chip integrates four 64-bit, OS-capable, RISC-V application cores; three TinyCore clusters that each contain six 32-bit, RISC-V compute cores (18 in total); and an electronic design automation-synthesized, standard-cell-based eFPGA. CIFER enables the decomposition of real-world applications and tailored execution (parallelization or specialization) per decomposed task. Our evaluation shows that: 1) the TinyCore clusters increase the throughput and energy efficiency of data- and thread-parallel tasks by up to 7.95× and 7.75× over one 64-bit core, respectively; 2) the eFPGA increases the throughput and energy efficiency of hardware-accelerable tasks by up to 9.29× and 10.62× , respectively; and 3) using coherent caches for data transfer between the processors and the eFPGA increases the throughput and energy efficiency by up to 11.1× and 10.5× , respectively.
KW - Cache memory
KW - computer architecture
KW - parallel architectures
KW - programmable logic arrays
KW - reconfigurable architectures
KW - system-on-chip (SoC)
UR - http://www.scopus.com/inward/record.url?scp=85167840940&partnerID=8YFLogxK
U2 - 10.1109/LSSC.2023.3303111
DO - 10.1109/LSSC.2023.3303111
M3 - Article
AN - SCOPUS:85167840940
SN - 2573-9603
VL - 6
SP - 229
EP - 232
JO - IEEE Solid-State Circuits Letters
JF - IEEE Solid-State Circuits Letters
ER -