Showing 601–620 of 2,609 results
/ Date/ Name
Jul 8, 2025PaddleOCR 3.0 Technical ReportJul 7, 2025Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document RestorationJul 4, 2025StreamDiT: Real-Time Streaming Text-to-Video GenerationJul 3, 2025Fair Deepfake Detectors Can GeneralizeJul 3, 2025MAC-Lookup: Multi-Axis Conditional Lookup Model for Underwater Image EnhancementJul 2, 2025FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion ModelJul 2, 2025Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual ReasoningJul 2, 2025Multi Source COVID-19 Detection via Kernel-Density-based Slice SamplingJul 1, 2025Not All Attention Heads Are What You Need: Refining CLIP's Image Representation with Attention AblationJun 30, 2025A Survey on Vision-Language-Action Models for Autonomous DrivingJun 28, 2025Deterministic Object Pose Confidence Region EstimationJun 27, 2025GenEscape: Hierarchical Multi-Agent Generation of Escape Room PuzzlesJun 25, 2025AdvMIM: Adversarial Masked Image Modeling for Semi-Supervised Medical Image SegmentationJun 25, 2025Visual-Semantic Knowledge Conflicts in Operating Rooms: Synthetic Data Curation for Surgical Risk Perception in Multimodal Large Language ModelsJun 24, 2025AirV2X: Unified Air-Ground Vehicle-to-Everything CollaborationJun 23, 2025Phantom-Data : Towards a General Subject-Consistent Video Generation DatasetJun 23, 2025AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data InteractionJun 22, 2025RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic ManipulationJun 22, 2025PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document UnderstandingJun 22, 2025Auto-Regressive Surface Cutting