"au:"Jifeng Dai"" — arXiv2 SearchShowing 1–9 of 9 results
/ Date/ Name
Oct 13, 2025Vlaser: Vision-Language-Action Model with Synergistic Embodied ReasoningAug 25, 2025InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and EfficiencyJul 19, 2025Docopilot: Improving Multimodal Models for Document-Level UnderstandingMay 30, 2025Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in SpacesJun 12, 2024OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with TextMay 18, 2023VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric TasksAug 6, 2022Frozen CLIP Models are Efficient Video LearnersMay 8, 2022ConvMAE: Masked Convolution Meets Masked AutoencodersSep 3, 20201st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask