"au:"Haoyu Lu"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Haoyu Lu"" — arXiv2 Search

Showing 1–20 of 34 results

/ Date/ Name

Aug 17, 2022Multimodal foundation models are better simulators of the human brain Apr 15, 2022COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval Sep 23, 2022LGDN: Language-Guided Denoising Network for Video-Language Modeling Mar 8, 2024DeepSeek-VL: Towards Real-World Vision-Language Understanding May 29, 2023speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition Mar 17, 2025Efficient Motion-Aware Video MLLM Dec 16, 2025HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices Nov 20, 2024Functional normalizing flow for statistical inverse problems of partial differential equations Apr 10, 2025Kimi-VL Technical Report Mar 24, 2021Learning Versatile Neural Architectures by Propagating Network Codes Mar 11, 2021WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training Oct 27, 2021Towards artificial general intelligence via a multimodal foundation model Jan 25, 2022Image Fragile Watermarking Algorithm Based on Deneighborhood Mapping Nov 2, 2022Monolingual Recognizers Fusion for Code-switching Speech Recognition Feb 13, 2023UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling Jun 20, 2024Towards Event-oriented Long Video Understanding Aug 6, 2024Characterizing the current systems in the Martian ionosphere Oct 21, 2024Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining Jun 13, 2024Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs Jan 10, 2026BabyVision: Visual Reasoning Beyond Language