Showing 1–11 of 11 results
/ Date/ Name
Nov 25, 2023Mug-STAN: Adapting Image-Language Pretrained Models for General Video UnderstandingMar 30, 2024ST-LLM: Large Language Models Are Effective Temporal LearnersSep 27, 2023BT-Adapter: Video Conversation is Feasible Without Video Instruction TuningJan 26, 2023Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge TransferringNov 4, 2024PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceNov 17, 2025Video Spatial Reasoning with Object-Centric 3D RolloutMay 29, 2024RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated AdapterJan 23, 2026Order from Chaos: Physical World Understanding from Glitchy Gameplay VideosDec 2, 2024PhysGame: Uncovering Physical Commonsense Violations in Gameplay VideosAug 20, 2024MUSE: Mamba is Efficient Multi-scale Learner for Text-video RetrievalOct 7, 2025Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow