"au:"Renrui Zhang"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Renrui Zhang"" — arXiv2 Search

Showing 1–14 of 14 results

/ Date/ Name

Mar 20, 2026MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints Nov 20, 2025Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation Oct 30, 2025Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Jun 5, 2025MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning May 11, 2025Seed1.5-VL Technical Report Feb 13, 2025MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency May 23, 2024TerDiT: Ternary Diffusion Models with Transformers Feb 8, 2024SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models Nov 13, 2023SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models Jun 15, 2023Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models Apr 28, 2023LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Mar 9, 2023Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking Aug 6, 2022Frozen CLIP Models are Efficient Video Learners Nov 18, 2020End-to-End Object Detection with Adaptive Clustering Transformer