How well do contrastively trained models transfer?

M. Moein   Shariatnia; Rahim Entezari; Mitchell Wortsman; Olga Saukh; Ludwig Schmidt

How well do contrastively trained models transfer?

M. Moein Shariatnia, Rahim Entezari, Mitchell Wortsman, Olga Saukh, Ludwig Schmidt

Institute of Technical Informatics (4480)

Research output: Contribution to conference › Paper › peer-review

Abstract

There are two prevailing methods for pre-training on large datasets to learn transferable representations: 1) supervised pre-training on large but weakly-labeled datasets; 2) contrastive training on image only and on image-text pairs. While supervised pre-training learns good representations that can be transferred to a wide range of tasks, contrastively trained models such as CLIP have demonstrated unprecedented zero-shot transfer. In this work we compare the transferability of the two aforementioned methods to multiple downstream tasks. The pre-training distributions we consider include YFCC, Conceptual Captions, and ImageNet- 21K while pre-training objectives range from supervised to SimCLR, CLIP, and SLIP. We observe that different pre-training methods with the same training source transfer similarly given their ImageNet accuracy

Original language	English
Number of pages	8
Publication status	Published - 23 Jul 2022
Event	ICML 2022 Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward - Baltimore, United States Duration: 23 Jul 2022 → 23 Jul 2022 https://pretraining.github.io/

Workshop

Workshop	ICML 2022 Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward
Country/Territory	United States
City	Baltimore
Period	23/07/22 → 23/07/22
Internet address	https://pretraining.github.io/

Access to Document

https://openreview.net/pdf?id=wS7lJi8NZgLicence: Other

Cite this

@conference{4babfaba5a6d4dc6babfae307dbc7995,

title = "How well do contrastively trained models transfer?",

abstract = "There are two prevailing methods for pre-training on large datasets to learn transferable representations: 1) supervised pre-training on large but weakly-labeled datasets; 2) contrastive training on image only and on image-text pairs. While supervised pre-training learns good representations that can be transferred to a wide range of tasks, contrastively trained models such as CLIP have demonstrated unprecedented zero-shot transfer. In this work we compare the transferability of the two aforementioned methods to multiple downstream tasks. The pre-training distributions we consider include YFCC, Conceptual Captions, and ImageNet- 21K while pre-training objectives range from supervised to SimCLR, CLIP, and SLIP. We observe that different pre-training methods with the same training source transfer similarly given their ImageNet accuracy",

author = "Shariatnia, {M. Moein} and Rahim Entezari and Mitchell Wortsman and Olga Saukh and Ludwig Schmidt",

year = "2022",

month = jul,

day = "23",

language = "English",

note = "ICML 2022 Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward ; Conference date: 23-07-2022 Through 23-07-2022",