Showing 481–500 of 2,609 results
/ Date/ Name
Oct 15, 2025InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn DialogueOct 15, 2025NTIRE 2025 Challenge on Low Light Image Enhancement: Methods and ResultsOct 15, 2025Beyond Pixels: A Differentiable Pipeline for Probing Neuronal Selectivity in 3DOct 14, 2025Scene Coordinate Reconstruction PriorsOct 14, 2025MetaCaptioner: Towards Generalist Visual Captioning with Open-source SuitesOct 13, 2025EEMS: Edge-Prompt Enhanced Medical Image Segmentation Based on Learnable Gating MechanismOct 13, 2025DTEA: Dynamic Topology Weaving and Instability-Driven Entropic Attenuation for Medical Image SegmentationOct 13, 2025Vlaser: Vision-Language-Action Model with Synergistic Embodied ReasoningOct 13, 2025FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment ModelOct 11, 2025Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging?Oct 9, 2025SkipSR: Faster Super Resolution with Token SkippingOct 9, 2025SViM3D: Stable Video Material Diffusion for Single Image 3D GenerationOct 9, 2025InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image EditingOct 9, 2025Test-Time Matching: Unlocking Compositional Reasoning in Multimodal ModelsOct 7, 2025Shaken or Stirred? An Analysis of MetaFormer's Token Mixing for Medical ImagingOct 3, 2025Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on MinecraftSep 29, 2025GHOST: Hallucination-Inducing Image Generation for Multimodal LLMsSep 29, 2025TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language ModelsSep 29, 2025Score-based Membership Inference on Diffusion ModelsSep 29, 2025Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale