Counterclockwise block-by-block knowledge distillation for neural network compression
Counterclockwise block-by-block knowledge distillation for neural network compression
Blog Article
Abstract Model compression is a technique for transforming large neural network models into smaller ones.Knowledge distillation (KD) is a crucial model compression technique that involves transferring knowledge from a large teacher model to a lightweight student model.Existing knowledge distillation methods typically facilitate the knowledge transfer from teacher to student models in one or two hp pavilion 15-eg1053cl stages.
This paper introduces a novel approach called counterclockwise block-wise knowledge distillation (CBKD) to optimize the knowledge distillation process.The core idea of CBKD aims to mitigate the generation gap between teacher and student models, facilitating the transmission of intermediate-layer knowledge from synovex one grass the teacher model.It divides both teacher and student models into multiple sub-network blocks, and in each stage of knowledge distillation, only the knowledge from one teacher sub-block is transferred to the corresponding position of a student sub-block.
Additionally, in the CBKD process, deeper teacher sub-network blocks are assigned higher compression rates.Extensive experiments on tiny-imagenet-200 and CIFAR-10 demonstrate that the proposed CBKD method can enhance the distillation performance of various mainstream knowledge distillation approaches.