"au:"Boshen Xu"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Boshen Xu"" — arXiv2 Search

Showing 1–11 of 11 results

/ Date/ Name

Mar 9, 2024SPAFormer: Sequential 3D Part Assembly with Transformers Mar 19, 2025EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining Mar 17, 2025Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding Nov 20, 2025TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Mar 9, 2024POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World May 28, 2024Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?Dec 19, 2025Xiaomi MiMo-VL-Miloco Technical Report May 17, 2021Real-Time Video Super-Resolution on Smartphones with Deep Learning, Mobile AI 2021 Challenge: Report Aug 25, 2024Unveiling Visual Biases in Audio-Visual Localization Benchmarks Nov 17, 2025REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Feb 3, 2026Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation