Showing 1–20 of 22 results
/ Date/ Name
Jun 16, 2025Fake it till You Make it: Reward Modeling as Discriminative PredictionNov 11, 2021The Emergence of Objectness: Learning Zero-Shot Segmentation from VideosApr 11, 2024Latent Guard: a Safety Framework for Text-to-image GenerationDec 23, 2025LongVideoAgent: Multi-Agent Reasoning with Long VideosDec 13, 2024AlignGuard: Scalable Safety Alignment for Text-to-Image GenerationDec 18, 2024VideoDPO: Omni-Preference Alignment for Video Diffusion GenerationMar 7, 2017Using Deep Learning Method for Classification: A Proposed Algorithm for the ISIC 2017 Skin Lesion Classification ChallengeAug 7, 2025Follow-Your-Instruction: A Comprehensive MLLM Agent for World Data SynthesisApr 6, 2026AvatarPointillist: AutoRegressive 4D Gaussian AvatarizationJan 3, 2019CLEVR-Ref+: Diagnosing Visual Reasoning with Referring ExpressionsSep 18, 2019Unsupervised Sketch-to-Photo SynthesisJun 16, 2025VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative TrainingMar 13, 2024Strengthening Multimodal Large Language Model with Bootstrapped Preference OptimizationOct 24, 2024Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource CodeDec 19, 2025Robust-R1: Degradation-Aware Reasoning for Robust Visual UnderstandingNov 11, 2024UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding ThoughtsSep 19, 2025Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language ModelsJun 17, 20203D Shape Reconstruction from Free-Hand SketchesMay 29, 2024LLMs Meet Multimodal Generation and Editing: A SurveyFeb 12, 2025I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models