Showing 21–40 of 45 results
/ Date/ Name
Nov 18, 2024Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language ModelsMar 24, 2025Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous DrivingApr 18, 2025Visual Intention Grounding for Egocentric AssistantsDec 22, 2025VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive GenerationSep 4, 2023Can I Trust Your Answer? Visually Grounded Video Question AnsweringAug 17, 2024MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing ModalityJan 27, 2025Understanding Long Videos via LLM-Powered Entity Relation GraphsMay 15, 2025MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot LearningJun 6, 2022Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised LearningMar 2, 2022Video Question Answering: Datasets, Algorithms and ChallengesOct 26, 2024Personality Analysis from Online Short Video Platforms with Multi-domain AdaptationAug 8, 2024VideoQA in the Era of LLMs: An Empirical StudyOct 6, 2025REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency RegularizationJan 5, 2021Temporal Meta-path Guided Explainable RecommendationAug 18, 2025EgoTwin: Dreaming Body and View in First PersonJan 20, 2026Interp3D: Correspondence-aware Interpolation for Generative Textured 3D MorphingOct 31, 2025Towards 1000-fold Electron Microscopy Image Compression for Connectomics via VQ-VAE with Transformer PriorMar 9, 2026MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led AlignmentJun 9, 2024Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data PerspectivesNov 12, 2025AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation