Showing 1–13 of 13 results
/ Date/ Name
Nov 21, 2022Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent AttentionJul 6, 2021VidLanKD: Improving Language Understanding via Video-Distilled Knowledge TransferNov 30, 2023CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any GenerationSep 28, 2022TVLT: Textless Vision-Language TransformerOct 4, 2024Grounding Language in Multi-Perspective Referential CommunicationMay 13, 2020Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQAJun 4, 2025Images are Worth Variable Length of RepresentationsMar 19, 2025TULIP: Towards Unified Language-Image PretrainingMay 19, 2023Any-to-Any Generation via Composable DiffusionDec 5, 2022Unifying Vision, Text, and Layout for Universal Document ProcessingDec 9, 2024Evaluating Model Perception of Color Illusions in Photorealistic ScenesJul 23, 2024AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction GameMay 18, 2023Paxion: Patching Action Knowledge in Video-Language Foundation Models