TY - GEN
T1 - Ternary feature masks: zero-forgetting for task-incremental learning
AU - Masana, M.
AU - Tuytelaars, T.
AU - Van De Weijer, J.
PY - 2021/6
Y1 - 2021/6
N2 - We propose an approach without any forgetting to continual learning for the task-aware regime, where at inference the task-label is known. By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them. Using masks prevents both catastrophic forgetting and backward transfer. We argue--and show experimentally--that avoiding the former largely compensates for the lack of the latter, which is rarely observed in practice. In contrast to earlier works, our masks are applied to the features (activations) of each layer instead of the weights. This considerably reduces the number of mask parameters for each new task; with more than three orders of magnitude for most networks. The encoding of the ternary masks into two bits per feature creates very little overhead to the network, avoiding scalability issues. To allow already learned features to adapt to the current task without changing the behavior of these features for previous tasks, we introduce task-specific feature normalization. Extensive experiments on several finegrained datasets and ImageNet show that our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
AB - We propose an approach without any forgetting to continual learning for the task-aware regime, where at inference the task-label is known. By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them. Using masks prevents both catastrophic forgetting and backward transfer. We argue--and show experimentally--that avoiding the former largely compensates for the lack of the latter, which is rarely observed in practice. In contrast to earlier works, our masks are applied to the features (activations) of each layer instead of the weights. This considerably reduces the number of mask parameters for each new task; with more than three orders of magnitude for most networks. The encoding of the ternary masks into two bits per feature creates very little overhead to the network, avoiding scalability issues. To allow already learned features to adapt to the current task without changing the behavior of these features for previous tasks, we introduce task-specific feature normalization. Extensive experiments on several finegrained datasets and ImageNet show that our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
UR - http://www.scopus.com/inward/record.url?scp=85113504296&partnerID=8YFLogxK
U2 - 10.1109/CVPRW53098.2021.00396
DO - 10.1109/CVPRW53098.2021.00396
M3 - Conference paper
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 3565
EP - 3574
BT - Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021
T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Y2 - 19 June 2021 through 25 June 2021
ER -