Teacher-Free Knowledge Distillation Based on Non-Progressive Meta-Learned Multi Ranking Selection

Bing Ru Jiang, Pin Hsuan Ho, Syuan Ting He, Albert S. Lin*

*此作品的通信作者

研究成果: Article同行評審

摘要

Knowledge Distillation (KD) is the procedure of extracting useful information from a previously trained model using an algorithm. The successful distillation pulls up the distilling model accuracy. In the context of model compression, the teacher model provides softened labels that facilitate distillation. While some effort has been devoted to progressive training, and a majority of the effort in literature has been in teacher-student distillation configuration or methodologies, less effort has been spent in non-progressive training, meta-learning controlled distillation, and progressive training with algorithm-controlled target distribution. In this paper, we proposed a framework of Teacher-Free Knowledge Distillation (TFKD) based on non-progressive meta-learned Reinforcement Learning (RL) method. The student model learns from free-form distribution, which will change during training. In this scenario, the target distribution is varied during the training epochs, and the variation is not necessarily continuously merging toward the true distribution. Due to the algorithm-controlled nature of the target distribution variation during KD, the meta-learning KD is established. We also designed a Multi-Ranking Selection (MRS) procedure to find a more potential model for continued training. We conducted our experiments using VGG-8 on CIFAR-100, CIFAR-10, and SVHN datasets. Our method has improved by 1.62% without MRS and 1.97% with MRS compared with the baseline model on CIFAR-100. Compared to State-Of-The-Art (SOTA) techniques, our approach achieves the highest accuracy of 72.41%.

原文English
頁(從 - 到)79685-79698
頁數14
期刊IEEE Access
12
DOIs
出版狀態Published - 2024

指紋

深入研究「Teacher-Free Knowledge Distillation Based on Non-Progressive Meta-Learned Multi Ranking Selection」主題。共同形成了獨特的指紋。

引用此