"au:"Yali Wang"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yali Wang"" — arXiv2 Search

Showing 41–60 of 95 results

/ Date/ Name

Oct 30, 2023Harvest Video Foundation Models via Efficient Post-Pretraining May 12, 2026BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion Jan 20, 2022CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning Oct 31, 2023SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction Dec 19, 2023M-BEV: Masked BEV Perception for Robust Autonomous Driving Jan 17, 2024Vlogger: Make Your Dream A Vlog Feb 12, 2020Progressive Object Transfer Detection Sep 15, 2021Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos Sep 1, 2016Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition Mar 5, 2018LSTD: A Low-Shot Transfer Detector for Object Detection Jan 8, 2025H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving Jan 21, 2025InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling May 18, 2025Video-GPT via Next Clip Diffusion Dec 16, 2024CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding Mar 6, 2025An Egocentric Vision-Language Model based Portable Real-time Smart Assistant Jul 3, 2025VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning Dec 1, 2025InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision Jan 26, 2024From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Apr 24, 2024MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI Jul 13, 2023InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

← Previous Next →