Showing 1–13 of 13 results
/ Date/ Name
Dec 11, 2025Mull-Tokens: Modality-Agnostic Latent ThinkingMar 17, 2025Web Artifact Attacks Disrupt Vision Language ModelsDec 10, 2024SAT: Dynamic Spatial Aptitude Training for Multimodal Language ModelsDec 3, 2023Learning to Compose SuperWeights for Neural Parameter Allocation SearchAug 8, 2023From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image RecognitionMar 28, 2023Language-Guided Audio-Visual Source Separation via Trimodal ConsistencyJul 26, 2022NewsStories: Illustrating articles with visual summariesFeb 10, 2022The Abduction of Sherlock Holmes: A Dataset for Visual Abductive ReasoningApr 17, 2021Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual EnvironmentsSep 8, 2019MULE: Multimodal Universal Language EmbeddingAug 17, 2019Language Features Matter: Effective Language Representations for Vision-Language TasksMay 26, 2019Why do These Match? Explaining the Behavior of Image Similarity ModelsNov 17, 2018Revisiting Image-Language Networks for Open-ended Phrase Detection