TY - JOUR
T1 - Teacher-Free Knowledge Distillation Based on Non-Progressive Meta-Learned Multi Ranking Selection
AU - Jiang, Bing Ru
AU - Ho, Pin Hsuan
AU - He, Syuan Ting
AU - Lin, Albert S.
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Knowledge Distillation (KD) is the procedure of extracting useful information from a previously trained model using an algorithm. The successful distillation pulls up the distilling model accuracy. In the context of model compression, the teacher model provides softened labels that facilitate distillation. While some effort has been devoted to progressive training, and a majority of the effort in literature has been in teacher-student distillation configuration or methodologies, less effort has been spent in non-progressive training, meta-learning controlled distillation, and progressive training with algorithm-controlled target distribution. In this paper, we proposed a framework of Teacher-Free Knowledge Distillation (TFKD) based on non-progressive meta-learned Reinforcement Learning (RL) method. The student model learns from free-form distribution, which will change during training. In this scenario, the target distribution is varied during the training epochs, and the variation is not necessarily continuously merging toward the true distribution. Due to the algorithm-controlled nature of the target distribution variation during KD, the meta-learning KD is established. We also designed a Multi-Ranking Selection (MRS) procedure to find a more potential model for continued training. We conducted our experiments using VGG-8 on CIFAR-100, CIFAR-10, and SVHN datasets. Our method has improved by 1.62% without MRS and 1.97% with MRS compared with the baseline model on CIFAR-100. Compared to State-Of-The-Art (SOTA) techniques, our approach achieves the highest accuracy of 72.41%.
AB - Knowledge Distillation (KD) is the procedure of extracting useful information from a previously trained model using an algorithm. The successful distillation pulls up the distilling model accuracy. In the context of model compression, the teacher model provides softened labels that facilitate distillation. While some effort has been devoted to progressive training, and a majority of the effort in literature has been in teacher-student distillation configuration or methodologies, less effort has been spent in non-progressive training, meta-learning controlled distillation, and progressive training with algorithm-controlled target distribution. In this paper, we proposed a framework of Teacher-Free Knowledge Distillation (TFKD) based on non-progressive meta-learned Reinforcement Learning (RL) method. The student model learns from free-form distribution, which will change during training. In this scenario, the target distribution is varied during the training epochs, and the variation is not necessarily continuously merging toward the true distribution. Due to the algorithm-controlled nature of the target distribution variation during KD, the meta-learning KD is established. We also designed a Multi-Ranking Selection (MRS) procedure to find a more potential model for continued training. We conducted our experiments using VGG-8 on CIFAR-100, CIFAR-10, and SVHN datasets. Our method has improved by 1.62% without MRS and 1.97% with MRS compared with the baseline model on CIFAR-100. Compared to State-Of-The-Art (SOTA) techniques, our approach achieves the highest accuracy of 72.41%.
KW - Teacher-free knowledge distillation
KW - knowledge distillation
KW - meta-learning
KW - model compression
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85195426467&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3409177
DO - 10.1109/ACCESS.2024.3409177
M3 - Article
AN - SCOPUS:85195426467
SN - 2169-3536
VL - 12
SP - 79685
EP - 79698
JO - IEEE Access
JF - IEEE Access
ER -