Showing 21–40 of 93 results
/ Date/ Name
May 3, 2022Cross Domain Object Detection by Target-Perceived Dual Branch DistillationMar 28, 2023Unmasked Teacher: Towards Training-Efficient Video Foundation ModelsMar 14, 2023Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud AnalysisDec 20, 2022MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled ConsistencyNov 17, 2022UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormerAug 20, 2024MUSES: 3D-Controllable Image Generation via Multi-Modal Agent CollaborationMar 10, 2025TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in VisionDec 11, 2024Bootstrapping Language-Guided Navigation Learning with Self-Refining Data FlywheelMar 2, 2025Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningApr 9, 2025VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-TuningOct 27, 2025VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary AnnotationsFeb 29, 2024Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video RecognitionMar 13, 2025LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM AgentsAug 7, 2025G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior SimulationJun 12, 2025VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative VideosJan 30, 2026Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop ReasoningFeb 11, 2026MotionWeaver: Holistic 4D-Anchored Framework for Multi-Humanoid Image AnimationNov 24, 2025VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement LearningOct 12, 2025UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and GenerationJun 6, 2025VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning