Bayesian asymmetric quantized neural networks

Jen Tzung Chien*, Su Ting Chang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

This paper develops a robust model compression for neural networks via parameter quantization. Traditionally, quantized neural networks (QNN) were constructed by binary or ternary weights where the weights were deterministic. This paper generalizes QNN in two directions. First, M-ary QNN is developed to adjust the balance between memory storage and model capacity. The representation values and the quantization partitions in M-ary quantization are mutually estimated to enhance the resolution of gradients in neural network training. A flexible quantization with asymmetric partitions is formulated. Second, the variational inference is incorporated to implement the Bayesian asymmetric QNN. The uncertainty of weights is faithfully represented to enhance the robustness of the trained model in presence of heterogeneous data. Importantly, the multiple spike-and-slab prior is proposed to represent the quantization levels in Bayesian asymmetric learning. M-ary quantization is then optimized by maximizing the evidence lower bound of classification network. An adaptive parameter space is built to implement Bayesian quantization and neural representation. The experiments on various image recognition tasks show that M-ary QNN achieves similar performance as the full-precision neural network (FPNN), but the memory cost and the test time are significantly reduced relative to FPNN. The merit of Bayesian M-ary QNN using multiple spike-and-slab prior is investigated.

Original languageEnglish
Article number109463
JournalPattern Recognition
Volume139
DOIs
StatePublished - Jul 2023

Keywords

  • Bayesian asymmetric quantization
  • Binary neural network
  • Model compression
  • Quantized neural network

Fingerprint

Dive into the research topics of 'Bayesian asymmetric quantized neural networks'. Together they form a unique fingerprint.

Cite this