TY - GEN
T1 - Residual Knowledge Retention for Edge Devices
AU - Liou, Cheng Fu
AU - Kuo, Paul
AU - Guo, Jiun In
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6/20
Y1 - 2021/6/20
N2 - This paper proposes an approach for continual learning, Knowledge Retention (KR), that learns new information without accessing data from previous tasks. A KR unit is based on the embedding layer that identifies the important kernel in the convolution layer, which preserves key parameters and allows the weights to be reused across tasks. To construct higher-order generalization, we design a Residual Knowledge Retention (RKR) architecture that facilitates the network to stack deeper layers. Additionally, we rethink the benefits of different residual blocks respectively after employing depthwise convolutions. A surprising observation is that the basic block taking advantage of depthwise convolutions achieves higher representational power and builds a more lightweight model than the bottleneck block counterpart. On the Alternating CIFAR10/100 benchmark, we empirically show that the KR unit can be integrated into diverse networks and effectively prevents catastrophic forgetting. Finally, we demonstrate that RKR significantly outperforms the existing state-of-the-art continual learning methods with at least 6 times lower model complexity in two different scenarios for continual learning, which supports that the proposed approach is more competitive for resource-limited edge devices.
AB - This paper proposes an approach for continual learning, Knowledge Retention (KR), that learns new information without accessing data from previous tasks. A KR unit is based on the embedding layer that identifies the important kernel in the convolution layer, which preserves key parameters and allows the weights to be reused across tasks. To construct higher-order generalization, we design a Residual Knowledge Retention (RKR) architecture that facilitates the network to stack deeper layers. Additionally, we rethink the benefits of different residual blocks respectively after employing depthwise convolutions. A surprising observation is that the basic block taking advantage of depthwise convolutions achieves higher representational power and builds a more lightweight model than the bottleneck block counterpart. On the Alternating CIFAR10/100 benchmark, we empirically show that the KR unit can be integrated into diverse networks and effectively prevents catastrophic forgetting. Finally, we demonstrate that RKR significantly outperforms the existing state-of-the-art continual learning methods with at least 6 times lower model complexity in two different scenarios for continual learning, which supports that the proposed approach is more competitive for resource-limited edge devices.
KW - Continual learning
KW - Embedding layer
KW - catastrophic forgetting
KW - depthwise convolution
KW - edge device
KW - residual block
UR - http://www.scopus.com/inward/record.url?scp=85118792700&partnerID=8YFLogxK
U2 - 10.1109/ISIE45552.2021.9576374
DO - 10.1109/ISIE45552.2021.9576374
M3 - Conference contribution
AN - SCOPUS:85118792700
T3 - IEEE International Symposium on Industrial Electronics
BT - Proceedings of 2021 IEEE 30th International Symposium on Industrial Electronics, ISIE 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th IEEE International Symposium on Industrial Electronics, ISIE 2021
Y2 - 20 June 2021 through 23 June 2021
ER -