What is WikiDO?
To address this gap, we introduce WikiDO (drawn from Wikipedia Ddiversity Observatory), a new cross-modal retrieval benchmark to assess the OOD generalization capabilities of pretrained VLMs. This consists of 380K image-text pairs from Wikipedia with domain labels, along with carefully curated, human-verified in-distribution (ID) and OOD test sets of size 3K each. The image-text pairs are very diverse in topics.
Why WikiDO?
Cross-modal (image-to-text and text-to-image) retrieval is an established task used in evaluation benchmarks to test the performance of vision-language models (VLMs). Several state-of-the-art VLMs (e.g. CLIP, BLIP-2) have achieved near-perfect performance on widely-used image-text retrieval benchmarks such as MSCOCO-Test-5K and Flickr30K-Test-1K. As a measure of out-of-distribution (OOD) generalization, prior works rely on zero-shot performance evaluated on one dataset (Flickr) using a VLM finetuned on another one (MSCOCO). We argue that such comparisons are insufficient to assess the OOD generalization capability of models due to high visual and linguistic similarity between the evaluation and finetuning datasets. WikiDO offers a strong cross-modal retrieval benchmark for current VLMs, especially for evaluating OOD generalization.
WikiDO Paper
Getting Started
The data is split into training, dev, and test sets. Download the dataset here (distributed under the
CC BY-NC 4.0 license):
WikiDO dataset
Details of baseline models and evaluation script can be found on the following GitHub site:
WikiDO Github Page
We will update the models and results on the leaderboard based on the publicly available papers. Feel free to contact
Pavan Kalyan if you want to submit your results.
How we construct WikiDO?
WikiDO consists of image-text data derived from Wikipedia Diversity Observatory, a diverse source of Wikipedia articles spanning several diversity axes including geography, gender, ethnicity and domains/topics. We focus on the domains axis that is most diverse in terms of coverage and spans different topics (as determined via topic labels assigned to each article) such as food, books, fashion and sports.
Have Questions or Want to Contribute ?
Feel free to contact
Pavan Kalyan and
Piyush Pasi. We would greatly appreciate it if you could provide us your helpful suggestions for this project.