Showing 1–20 of 22 results
/ Date/ Name
Feb 10, 2025KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual ClassificationJun 17, 2024Unveiling Encoder-Free Vision-Language ModelsMar 23, 2023Plug-and-Play Regulators for Image-Text MatchingOct 20, 2024GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric LearningJul 10, 2024SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer LearningAug 28, 2023UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and MemoryFeb 10, 2025EVEv2: Improved Baselines for Encoder-Free Vision-Language ModelsJan 5, 2021Similarity Reasoning and Filtration for Image-Text MatchingApr 28, 2024Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text MatchingJul 28, 2025Regularizing Subspace Redundancy of Low-Rank AdaptationOct 16, 2025From Pixels to Words -- Towards Native Vision-Language Primitives at ScaleJul 11, 2024DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal PerceptionMay 15, 2025End-to-End Vision Tokenizer TuningOct 24, 2024Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction DataMar 26, 2024Exploring Dynamic Transformer for Efficient Object TrackingFeb 4, 2026VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?Dec 22, 2025The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified AutoencodingSep 29, 2025Visual Jigsaw Post-Training Improves MLLMsDec 2, 2024MoTrans: Customized Motion Transfer with Text-driven Video Diffusion ModelsOct 26, 2024LLMs Can Evolve Continually on Modality for X-Modal Reasoning