Does CLIP's generalization performance mainly stem from high train-test similarity?

Prasann Mayilvahanan*
University of Tübingen, MPI-IS, Tübingen AI Center
Thaddäus Wiedemer*
MPI-IS, University of Tübingen, Tübingen AI Center
Evgenia Rusak
University of Tübingen, MPI-IS, TÜbingen AI Center
Matthias Bethge
University of Tübingen, Tübingen AI Center
Wieland Brendel
MPI-IS, ELLIS Institute Tübingen, Tübingen AI Center

tl;dr: CLIP's ability to generalize to standard OOD benchmarks does not mainly stem from highly similar images in its training dataset.

News

Feb '24 Our paper was accepted at ICLR 2024!
Feb '24 An earlier version of our paper was accepted at the NeurIPS 2023 DistShift workshop!
Oct '23 The pre-print is now available on arXiv.

Abstract

Acknowledgements & Funding

BibTeX

If you find our study helpful, please cite our paper:

@inproceedings{mayilvahanan2024does,
  title={Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?},
  author={
    Prasanna Mayilvahanan and Thadd{\"a}us Wiedemer and Evgenia Rusak and Matthias Bethge and Wieland Brendel
  },
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=tnBaiidobu}
}
Webpage designed using Bootstrap 4.5. Layout courtesy of Roland Zimmermann.