The Role of Pre-training Data in Transfer Learning

Rahim Entezari; Mitchell Wortsman; Olga Saukh; M. Moein   Shariatnia; Hanie Sedghi; Ludwig Schmidt

The Role of Pre-training Data in Transfer Learning

Rahim Entezari^*, Mitchell Wortsman, Olga Saukh, M. Moein Shariatnia, Hanie Sedghi, Ludwig Schmidt

^*Corresponding author for this work

Institute of Technical Informatics (4480)

Research output: Contribution to conference › Paper

Abstract

The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high- accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000× more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy

Original language	English
Number of pages	38
Publication status	Published - 27 Feb 2023
Event	1st Multimodal Representation Learning Workshop: ICLR 2023 - Virtual, Rwanda Duration: 1 May 2023 → 5 May 2023 https://iclr.cc/virtual/2023/workshop/12836

Workshop

Workshop	1st Multimodal Representation Learning Workshop
Abbreviated title	ICLR 2023
Country/Territory	Rwanda
City	Virtual
Period	1/05/23 → 5/05/23
Internet address	https://iclr.cc/virtual/2023/workshop/12836

Fields of Expertise

Information, Communication & Computing

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

https://arxiv.org/abs/2302.13602Licence: CC BY-NC-ND 4.0

Cite this

@conference{7488ae8fa1944c72ab3353bd81c2973b,

title = "The Role of Pre-training Data in Transfer Learning",

abstract = "The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high- accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000× more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy",

author = "Rahim Entezari and Mitchell Wortsman and Olga Saukh and Shariatnia, {M. Moein} and Hanie Sedghi and Ludwig Schmidt",

year = "2023",

month = feb,

day = "27",

language = "English",

note = "1st Multimodal Representation Learning Workshop : ICLR 2023, ICLR 2023 ; Conference date: 01-05-2023 Through 05-05-2023",

url = "https://iclr.cc/virtual/2023/workshop/12836",

}

TY - CONF

T1 - The Role of Pre-training Data in Transfer Learning

AU - Entezari, Rahim

AU - Wortsman, Mitchell

AU - Saukh, Olga

AU - Shariatnia, M. Moein

AU - Sedghi, Hanie

AU - Schmidt, Ludwig

PY - 2023/2/27

Y1 - 2023/2/27

N2 - The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high- accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000× more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy

AB - The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high- accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000× more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy

M3 - Paper

T2 - 1st Multimodal Representation Learning Workshop

Y2 - 1 May 2023 through 5 May 2023

ER -

The Role of Pre-training Data in Transfer Learning

Abstract

Workshop

Fields of Expertise

UN SDGs

Access to Document

Fingerprint

Cite this