"au:"Yali Wang"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yali Wang"" — arXiv2 Search

Showing 21–40 of 93 results

/ Date/ Name

May 3, 2022Cross Domain Object Detection by Target-Perceived Dual Branch Distillation Mar 28, 2023Unmasked Teacher: Towards Training-Efficient Video Foundation Models Mar 14, 2023Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis Dec 20, 2022MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency Nov 17, 2022UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer Aug 20, 2024MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration Mar 10, 2025TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision Dec 11, 2024Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel Mar 2, 2025Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning Apr 9, 2025VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Oct 27, 2025VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations Feb 29, 2024Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition Mar 13, 2025LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents Aug 7, 2025G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior Simulation Jun 12, 2025VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Jan 30, 2026Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning Feb 11, 2026MotionWeaver: Holistic 4D-Anchored Framework for Multi-Humanoid Image Animation Nov 24, 2025VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning Oct 12, 2025UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation Jun 6, 2025VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning

← Previous Next →