"au:"Yuchi Wang"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yuchi Wang"" — arXiv2 Search

Showing 1–18 of 18 results

/ Date/ Name

Apr 16, 2024LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?Dec 23, 2024VidTwin: Video VAE with Decoupled Structure and Dynamics May 24, 2024InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation May 28, 2025RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction Apr 7, 2026MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control Feb 20, 2024UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing Mar 10, 2025TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation Oct 3, 2023Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond Jan 22, 2025Multiple Queries with Multiple Keys: A Precise Prompt Matching Paradigm for Prompt-based Continual Learning Sep 20, 2025Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle Nov 12, 2025Human or LLM as Standardized Patients? A Comparative Study for Medical Education Nov 26, 2023GAIA: Zero-shot Talking Avatar Generation Jun 12, 2024Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement Feb 21, 2024PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain Sep 22, 2024Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints Oct 14, 2025SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model Aug 7, 2025PrinciplismQA: A Philosophy-Grounded Approach to Assessing LLM-Human Clinical Medical Ethics Alignment Apr 10, 2026GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic