"au:"Yicong Li"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yicong Li"" — arXiv2 Search

Showing 21–40 of 45 results

/ Date/ Name

Nov 18, 2024Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models Mar 24, 2025Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving Apr 18, 2025Visual Intention Grounding for Egocentric Assistants Dec 22, 2025VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation Sep 4, 2023Can I Trust Your Answer? Visually Grounded Video Question Answering Aug 17, 2024MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality Jan 27, 2025Understanding Long Videos via LLM-Powered Entity Relation Graphs May 15, 2025MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning Jun 6, 2022Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning Mar 2, 2022Video Question Answering: Datasets, Algorithms and Challenges Oct 26, 2024Personality Analysis from Online Short Video Platforms with Multi-domain Adaptation Aug 8, 2024VideoQA in the Era of LLMs: An Empirical Study Oct 6, 2025REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization Jan 5, 2021Temporal Meta-path Guided Explainable Recommendation Aug 18, 2025EgoTwin: Dreaming Body and View in First Person Jan 20, 2026Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing Oct 31, 2025Towards 1000-fold Electron Microscopy Image Compression for Connectomics via VQ-VAE with Transformer Prior Mar 9, 2026MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment Jun 9, 2024Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives Nov 12, 2025AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation

← Previous Next →