Is Appearance Free Action Recognition Possible?

Filip Ilic; Thomas Pock; Richard P. Wildes

doi:10.1007/978-3-031-19772-7_10

Is Appearance Free Action Recognition Possible?

Filip Ilic^*, Thomas Pock, Richard P. Wildes

^*Corresponding author for this work

Institute of Computer Graphics and Vision (7100)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absence makes it difficult to understand how well contemporary architectures capitalize on dynamic vs. static information. We respond with a novel Appearance Free Dataset (AFD) for action recognition. AFD is devoid of static information relevant to action recognition in a single frame. Modeling of the dynamics is necessary for solving the task, as the action is only apparent through consideration of the temporal dimension. We evaluated 11 contemporary action recognition architectures on AFD as well as its related RGB video. Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. Our results motivate a novel architecture that revives explicit recovery of optical flow, within a contemporary design for best performance on AFD and RGB.

Original language	English
Title of host publication	Computer Vision – ECCV 2022
Subtitle of host publication	17th European Conference, 2022, Proceedings
Editors	Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	156-173
Volume	4
ISBN (Print)	9783031197710
DOIs	https://doi.org/10.1007/978-3-031-19772-7_10
Publication status	Published - 2022
Event	2022 European Conference on Computer Vision: ECCV 2022 - Hybrider Event, Tel Aviv, Israel Duration: 23 Oct 2022 → 27 Oct 2022

Publication series

Name	Lecture Notes in Computer Science
Volume	13664
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	2022 European Conference on Computer Vision
Abbreviated title	ECCV 2022
Country/Territory	Israel
City	Hybrider Event, Tel Aviv
Period	23/10/22 → 27/10/22

Keywords

Action recognition
Action recognition dataset
Deep learning
Human motion perception
Static and dynamic video representation

ASJC Scopus subject areas

Theoretical Computer Science
Computer Science(all)

Access to Document

10.1007/978-3-031-19772-7_10

Cite this

Ilic, F., Pock, T., & Wildes, R. P. (2022). Is Appearance Free Action Recognition Possible? In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, & T. Hassner (Eds.), Computer Vision – ECCV 2022 : 17th European Conference, 2022, Proceedings (Vol. 4, pp. 156-173). (Lecture Notes in Computer Science; Vol. 13664 ). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19772-7_10

Is Appearance Free Action Recognition Possible? / Ilic, Filip ; Pock, Thomas; Wildes, Richard P.
Computer Vision – ECCV 2022 : 17th European Conference, 2022, Proceedings. ed. / Shai Avidan; Gabriel Brostow; Moustapha Cissé; Giovanni Maria Farinella; Tal Hassner. Vol. 4 Springer Science and Business Media Deutschland GmbH, 2022. p. 156-173 (Lecture Notes in Computer Science; Vol. 13664 ).

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Ilic, F , Pock, T & Wildes, RP 2022, Is Appearance Free Action Recognition Possible? in S Avidan, G Brostow, M Cissé, GM Farinella & T Hassner (eds), Computer Vision – ECCV 2022 : 17th European Conference, 2022, Proceedings. vol. 4, Lecture Notes in Computer Science, vol. 13664 , Springer Science and Business Media Deutschland GmbH, pp. 156-173, 2022 European Conference on Computer Vision, Hybrider Event, Tel Aviv, Israel, 23/10/22. https://doi.org/10.1007/978-3-031-19772-7_10

Ilic F , Pock T, Wildes RP. Is Appearance Free Action Recognition Possible? In Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors, Computer Vision – ECCV 2022 : 17th European Conference, 2022, Proceedings. Vol. 4. Springer Science and Business Media Deutschland GmbH. 2022. p. 156-173. (Lecture Notes in Computer Science). doi: 10.1007/978-3-031-19772-7_10

Ilic, Filip ; Pock, Thomas ; Wildes, Richard P. / Is Appearance Free Action Recognition Possible?. Computer Vision – ECCV 2022 : 17th European Conference, 2022, Proceedings. editor / Shai Avidan ; Gabriel Brostow ; Moustapha Cissé ; Giovanni Maria Farinella ; Tal Hassner. Vol. 4 Springer Science and Business Media Deutschland GmbH, 2022. pp. 156-173 (Lecture Notes in Computer Science).

@inproceedings{9b16b4a5091d41208d85cf5a3c4dae3b,

title = "Is Appearance Free Action Recognition Possible?",

abstract = "Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absence makes it difficult to understand how well contemporary architectures capitalize on dynamic vs. static information. We respond with a novel Appearance Free Dataset (AFD) for action recognition. AFD is devoid of static information relevant to action recognition in a single frame. Modeling of the dynamics is necessary for solving the task, as the action is only apparent through consideration of the temporal dimension. We evaluated 11 contemporary action recognition architectures on AFD as well as its related RGB video. Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. Our results motivate a novel architecture that revives explicit recovery of optical flow, within a contemporary design for best performance on AFD and RGB.",

keywords = "Action recognition, Action recognition dataset, Deep learning, Human motion perception, Static and dynamic video representation",

author = "Filip Ilic and Thomas Pock and Wildes, {Richard P.}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 2022 European Conference on Computer Vision : ECCV 2022, ECCV 2022 ; Conference date: 23-10-2022 Through 27-10-2022",

year = "2022",

doi = "10.1007/978-3-031-19772-7_10",

language = "English",

isbn = "9783031197710",

volume = "4",

series = "Lecture Notes in Computer Science",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "156--173",

editor = "Shai Avidan and Gabriel Brostow and Moustapha Ciss{\'e} and Farinella, {Giovanni Maria} and Tal Hassner",