Showing 1–20 of 24 results
/ Date/ Name
Jan 31, 2024ControlCap: Controllable Region-level CaptioningJul 28, 2025Geometric-Mean Policy OptimizationFeb 1, 2026Balancing Understanding and Generation in Discrete Diffusion ModelsSep 20, 2023KOSMOS-2.5: A Multimodal Literate ModelJan 18, 2024VMamba: Visual State Space ModelOct 13, 2025DocReward: A Document Reward Model for Structuring and StylizingJul 4, 2022Explore Faster Localization Learning For Scene Text DetectionMay 5, 2023FlowText: Synthesizing Realistic Scene Text Video with Optical Flow EstimationJul 1, 2024Evaluation of Text-to-Video Generation Models: A Dynamics PerspectiveOct 19, 2023PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware PretrainingMar 21, 2023DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion ModelsDec 27, 2024Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-WorldMay 25, 2024DynRefer: Delving into Region-level Multimodal Tasks via Dynamic ResolutionDec 11, 2024CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image SynthesisMar 27, 2025Model as a Game: On Numerical and Spatial Consistency for Generative GamesNov 29, 2023Continual Learning for Image Segmentation with Dynamic QueryJul 19, 2023Generative Prompt Model for Weakly Supervised Object LocalizationMay 5, 2023A Large Cross-Modal Video Retrieval Dataset with Reading ComprehensionApr 10, 2023ICDAR 2023 Video Text Reading Competition for Dense and Small TextApr 2, 2025From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis