CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video

Wei Lin*, Anna Kukleva, Kunyang Sun, Horst Possegger, Hilde Kuehne, Horst Bischof

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review


Although action recognition has achieved impressive results over recent years, both collection and annotation of video training data are still time-consuming and cost intensive. Therefore, image-to-video adaptation has been proposed to exploit labeling-free web image source for adapting on unlabeled target videos. This poses two major challenges: (1) spatial domain shift between web images and video frames; (2) modality gap between image and video data. To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation by leveraging the joint spatial information in images and videos on the one hand and, on the other hand, training an independent spatio-temporal model to bridge the modality gap. We alternate between the spatial and spatio-temporal learning with knowledge transfer between the two in each cycle. We evaluate our approach on benchmark datasets for image-to-video as well as for mixed-source domain adaptation achieving state-of-the-art results and demonstrating the benefits of our cyclic adaptation.
Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022
Subtitle of host publication17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III
Place of PublicationCham
Number of pages17
ISBN (Electronic)978-3-031-20062-5
ISBN (Print)978-3-031-20061-8
Publication statusPublished - 2022
Event2022 European Conference on Computer Vision: ECCV 2022 - Hybrider Event, Tel Aviv, Israel
Duration: 23 Oct 202227 Oct 2022

Publication series

NameLecture Notes in Computer Science


Conference2022 European Conference on Computer Vision
Abbreviated titleECCV 2022
CityHybrider Event, Tel Aviv


Dive into the research topics of 'CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video'. Together they form a unique fingerprint.

Cite this