Showing 1–12 of 12 results
/ Date/ Name
Nov 29, 2021On the Integration of Self-Attention and ConvolutionMay 24, 2024ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal ModelsFeb 14, 2022Domain Adaptation via Prompt LearningJan 9, 2025OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?Nov 26, 2025Qwen3-VL Technical ReportJun 15, 2025Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language ModelsJun 6, 2024Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain AlignmentNov 17, 2022Cross-Modal Adapter for Vision-Language RetrievalNov 22, 2023Using Human Feedback to Fine-tune Diffusion Models without Any Reward ModelMay 26, 2024Demystify Mamba in Vision: A Linear Attention PerspectiveMar 18, 2024LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution ImagesNov 29, 2024Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation