arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Dhruba Ghosh"" — arXiv2 Search
Showing 1–9 of 9 results
/ Date
/ Name
Feb 19, 2026
Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models
Oct 17, 2023
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment
Dec 8, 2021
The Effect of Model Size on Worst-Group Generalization
Apr 1, 2024
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Apr 27, 2023
DataComp: In search of the next generation of multimodal datasets
Oct 13, 2025
Data or Language Supervision: What Makes CLIP Better than DINO?
May 13, 2021
Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level
Jun 17, 2024
DataComp-LM: In search of the next generation of training sets for language models
May 28, 2024
Why are Visually-Grounded Language Models Bad at Image Classification?