Showing 641–660 of 2,609 results
/ Date/ Name
Jun 5, 2025Degradation-Aware Image Enhancement via Vision-Language ClassificationJun 4, 2025WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural PlanningJun 4, 2025Object-centric 3D Motion Field for Robot Learning from Human VideosJun 4, 2025Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night DatasetJun 3, 2025SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction ScenariosJun 2, 2025EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLMJun 2, 2025Incentivizing Reasoning for Advanced Instruction-Following of Large Language ModelsJun 1, 2025Towards Predicting Any Human Trajectory In ContextMay 30, 2025Applying Vision Transformers on Spectral Analysis of Astronomical ObjectsMay 30, 2025Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in SpacesMay 30, 2025Reading Recognition in the WildMay 30, 2025DisTime: Distribution-based Time Representation for Video Large Language ModelsMay 29, 2025ScaleLong: A Multi-Timescale Benchmark for Long Video UnderstandingMay 29, 2025Argus: Vision-Centric Reasoning with Grounded Chain-of-ThoughtMay 29, 2025OpenUni: A Simple Baseline for Unified Multimodal Understanding and GenerationMay 29, 2025Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual InformationMay 29, 2025Quality assessment of 3D human animation: Subjective and objective evaluationMay 29, 2025iHDR: Iterative HDR Imaging with Arbitrary Number of ExposuresMay 28, 2025Test-time augmentation improves efficiency in conformal predictionMay 28, 2025VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models