"au:"Haiwen Diao"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Haiwen Diao"" — arXiv2 Search

Showing 1–20 of 22 results

/ Date/ Name

Feb 10, 2025KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification Jun 17, 2024Unveiling Encoder-Free Vision-Language Models Mar 23, 2023Plug-and-Play Regulators for Image-Text Matching Oct 20, 2024GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning Jul 10, 2024SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning Aug 28, 2023UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory Feb 10, 2025EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Jan 5, 2021Similarity Reasoning and Filtration for Image-Text Matching Apr 28, 2024Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching Jul 28, 2025Regularizing Subspace Redundancy of Low-Rank Adaptation Oct 16, 2025From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Jul 11, 2024DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception May 15, 2025End-to-End Vision Tokenizer Tuning Oct 24, 2024Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data Mar 26, 2024Exploring Dynamic Transformer for Efficient Object Tracking Feb 4, 2026VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?Dec 22, 2025The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Sep 29, 2025Visual Jigsaw Post-Training Improves MLLMs Dec 2, 2024MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models Oct 26, 2024LLMs Can Evolve Continually on Modality for X-Modal Reasoning