"au:"Jifeng Dai"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Jifeng Dai"" — arXiv2 Search

Showing 1–9 of 9 results

/ Date/ Name

Oct 13, 2025Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Aug 25, 2025InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Jul 19, 2025Docopilot: Improving Multimodal Models for Document-Level Understanding May 30, 2025Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Jun 12, 2024OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text May 18, 2023VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks Aug 6, 2022Frozen CLIP Models are Efficient Video Learners May 8, 2022ConvMAE: Masked Convolution Meets Masked Autoencoders Sep 3, 20201st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask