Video Test-Time Adaptation for Action Recognition

Wei Lin; Muhammad Jehanzeb Mirza; Mateusz Kozinski; Horst Possegger; Hilde  Kuehne; Horst Bischof

doi:10.1109/CVPR52729.2023.02198

Video Test-Time Adaptation for Action Recognition

Wei Lin, Muhammad Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger, Hilde Kuehne, Horst Bischof

Institute of Computer Graphics and Vision (7100)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at https://github.com/wlin-at/ViTTA.

Original language	English
Title of host publication	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Pages	22952-22961
DOIs	https://doi.org/10.1109/CVPR52729.2023.02198
Publication status	Published - 2023
Event	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition: CVPR 2023 - Vancouver, Canada Duration: 17 Jun 2023 → 24 Jun 2023

Conference

Conference	2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Abbreviated title	CVPR 2023
Country/Territory	Canada
City	Vancouver
Period	17/06/23 → 24/06/23

Access to Document

10.1109/CVPR52729.2023.02198

https://openaccess.thecvf.com/content/CVPR2023/papers/Lin_Video_Test-Time_Adaptation_for_Action_Recognition_CVPR_2023_paper.pdfLicence: Unspecified

Cite this

Lin, W, Mirza, MJ , Kozinski, M , Possegger, H, Kuehne, H & Bischof, H 2023, Video Test-Time Adaptation for Action Recognition. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 22952-22961, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, British Columbia, Canada, 17/06/23. https://doi.org/10.1109/CVPR52729.2023.02198

@inproceedings{fb1a80a5b3b24174a7f07f87a34ad5f2,

title = "Video Test-Time Adaptation for Action Recognition",

abstract = "Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at https://github.com/wlin-at/ViTTA.",

author = "Wei Lin and Mirza, {Muhammad Jehanzeb} and Mateusz Kozinski and Horst Possegger and Hilde Kuehne and Horst Bischof",

year = "2023",

doi = "10.1109/CVPR52729.2023.02198",

language = "English",

pages = "22952--22961",

booktitle = "IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)",

note = "2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition : CVPR 2023, CVPR 2023 ; Conference date: 17-06-2023 Through 24-06-2023",

}

TY - GEN

T1 - Video Test-Time Adaptation for Action Recognition

AU - Lin, Wei

AU - Mirza, Muhammad Jehanzeb

AU - Kozinski, Mateusz

AU - Possegger, Horst

AU - Kuehne, Hilde

AU - Bischof, Horst

PY - 2023

Y1 - 2023

N2 - Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at https://github.com/wlin-at/ViTTA.

AB - Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at https://github.com/wlin-at/ViTTA.

U2 - 10.1109/CVPR52729.2023.02198

DO - 10.1109/CVPR52729.2023.02198

M3 - Conference paper

SP - 22952

EP - 22961

BT - IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Y2 - 17 June 2023 through 24 June 2023

ER -

Video Test-Time Adaptation for Action Recognition

Abstract

Conference

Access to Document

Fingerprint

Cite this