"au:"Shunian Chen"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Shunian Chen"" — arXiv2 Search

Showing 1–20 of 21 results

/ Date/ Name

Aug 19, 2025TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis Jun 1, 2025FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Feb 18, 2024ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models Jul 8, 2025MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos Apr 29, 2024MileBench: Benchmarking MLLMs in Long Context Jun 27, 2024HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Feb 16, 2024Humans or LLMs as the Judge? A Study on Judgement Biases Nov 16, 2023HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs Dec 16, 2024BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement Aug 20, 2024Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications Nov 6, 2024Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM Apr 1, 2026Do Phone-Use Agents Respect Your Privacy?Dec 17, 2023Silkie: Preference Distillation for Large Visual Language Models Sep 17, 2024Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs Sep 4, 2024LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture Oct 12, 2024VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment Nov 23, 2023MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria Jun 22, 2025ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Aug 22, 2025MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols Feb 20, 2026From Lossy to Verified: A Provenance-Aware Tiered Memory for Agents