Showing 1–18 of 18 results
/ Date/ Name
Apr 16, 2024LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?Dec 23, 2024VidTwin: Video VAE with Decoupled Structure and DynamicsMay 24, 2024InstructAvatar: Text-Guided Emotion and Motion Control for Avatar GenerationMay 28, 2025RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual ReconstructionApr 7, 2026MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive ControlFeb 20, 2024UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance EditingMar 10, 2025TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image GenerationOct 3, 2023Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and BeyondJan 22, 2025Multiple Queries with Multiple Keys: A Precise Prompt Matching Paradigm for Prompt-based Continual LearningSep 20, 2025Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM LifecycleNov 12, 2025Human or LLM as Standardized Patients? A Comparative Study for Medical EducationNov 26, 2023GAIA: Zero-shot Talking Avatar GenerationJun 12, 2024Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance DisentanglementFeb 21, 2024PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action ChainSep 22, 2024Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic HintsOct 14, 2025SAIL-Embedding Technical Report: Omni-modal Embedding Foundation ModelAug 7, 2025PrinciplismQA: A Philosophy-Grounded Approach to Assessing LLM-Human Clinical Medical Ethics AlignmentApr 10, 2026GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic