"au:"Yuqi Huo"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yuqi Huo"" — arXiv2 Search

Showing 1–19 of 19 results

/ Date/ Name

May 22, 2023VDT: General-purpose Video Diffusion Transformers via Mask Modeling Oct 17, 2024Exploring the Design Space of Visual Context Representation in Video MLLMs Apr 15, 2022COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval Sep 23, 2022LGDN: Language-Guided Denoising Network for Video-Language Modeling Mar 17, 2025Efficient Motion-Aware Video MLLM Oct 11, 2024Baichuan-Omni Technical Report Aug 27, 2019Mobile Video Action Recognition Dec 10, 2019Learning Depth-Guided Convolutions for Monocular 3D Object Detection Mar 24, 2021Learning Versatile Neural Architectures by Propagating Network Codes Jun 14, 2021Pre-Trained Models: Past, Present and Future Mar 11, 2021WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training Oct 27, 2021Towards artificial general intelligence via a multimodal foundation model Feb 13, 2023UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling Jun 20, 2024Towards Event-oriented Long Video Understanding Oct 21, 2024Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining Jun 13, 2024Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs Feb 18, 2025Baichuan-M1: Pushing the Medical Capability of Large Language Models Jan 26, 2025Baichuan-Omni-1.5 Technical Report Jan 3, 2025Virgo: A Preliminary Exploration on Reproducing o1-like MLLM