"au:"Yuzhong Zhao"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yuzhong Zhao"" — arXiv2 Search

Showing 1–20 of 24 results

/ Date/ Name

Jan 31, 2024ControlCap: Controllable Region-level Captioning Jul 28, 2025Geometric-Mean Policy Optimization Feb 1, 2026Balancing Understanding and Generation in Discrete Diffusion Models Sep 20, 2023KOSMOS-2.5: A Multimodal Literate Model Jan 18, 2024VMamba: Visual State Space Model Oct 13, 2025DocReward: A Document Reward Model for Structuring and Stylizing Jul 4, 2022Explore Faster Localization Learning For Scene Text Detection May 5, 2023FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation Jul 1, 2024Evaluation of Text-to-Video Generation Models: A Dynamics Perspective Oct 19, 2023PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining Mar 21, 2023DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models Dec 27, 2024Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World May 25, 2024DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution Dec 11, 2024CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis Mar 27, 2025Model as a Game: On Numerical and Spatial Consistency for Generative Games Nov 29, 2023Continual Learning for Image Segmentation with Dynamic Query Jul 19, 2023Generative Prompt Model for Weakly Supervised Object Localization May 5, 2023A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension Apr 10, 2023ICDAR 2023 Video Text Reading Competition for Dense and Small Text Apr 2, 2025From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis