"au:"Yu Qiao"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yu Qiao"" — arXiv2 Search

Showing 1–20 of 29 results

/ Date/ Name

Mar 26, 2026Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Mar 21, 2026ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework Mar 10, 2026InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Oct 27, 2025EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT Oct 14, 2025MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites Oct 13, 2025Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Sep 29, 2025Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale Sep 26, 2025MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Aug 25, 2025InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency May 30, 2025Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Jan 14, 2025Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models Jul 10, 2024VEnhancer: Generative Space-Time Enhancement for Video Generation Jun 26, 2024EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation Jun 12, 2024OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Mar 26, 2024InternLM2 Technical Report Mar 22, 2024InternVideo2: Scaling Foundation Models for Multimodal Video Understanding Mar 11, 2024VideoMamba: State Space Model for Efficient Video Understanding Feb 29, 2024WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset Feb 8, 2024SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models Nov 13, 2023SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models