Is Appearance Free Action Recognition Possible?

Filip Ilic*, Thomas Pock, Richard P. Wildes

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absence makes it difficult to understand how well contemporary architectures capitalize on dynamic vs. static information. We respond with a novel Appearance Free Dataset (AFD) for action recognition. AFD is devoid of static information relevant to action recognition in a single frame. Modeling of the dynamics is necessary for solving the task, as the action is only apparent through consideration of the temporal dimension. We evaluated 11 contemporary action recognition architectures on AFD as well as its related RGB video. Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. Our results motivate a novel architecture that revives explicit recovery of optical flow, within a contemporary design for best performance on AFD and RGB.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022
Subtitle of host publication17th European Conference, 2022, Proceedings
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
PublisherSpringer Science and Business Media Deutschland GmbH
Pages156-173
Volume4
ISBN (Print)9783031197710
DOIs
Publication statusPublished - 2022
Event2022 European Conference on Computer Vision: ECCV 2022 - Hybrider Event, Tel Aviv, Israel
Duration: 23 Oct 202227 Oct 2022

Publication series

NameLecture Notes in Computer Science
Volume13664
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2022 European Conference on Computer Vision
Abbreviated titleECCV 2022
Country/TerritoryIsrael
CityHybrider Event, Tel Aviv
Period23/10/2227/10/22

Keywords

  • Action recognition
  • Action recognition dataset
  • Deep learning
  • Human motion perception
  • Static and dynamic video representation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Is Appearance Free Action Recognition Possible?'. Together they form a unique fingerprint.

Cite this