Showing 1–18 of 18 results
/ Date/ Name
Mar 13, 2025VMBench: A Benchmark for Perception-Aligned Video Motion GenerationDec 27, 2024Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-WorldAug 7, 2025VS-LLM: Visual-Semantic Depression Assessment based on LLM for Drawing Projection TestMar 23, 2026Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World ModelsOct 3, 2024DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLMApr 14, 2026Calibration-Aware Policy Optimization for Reasoning LLMsMay 20, 2024DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLMNov 23, 2024How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language TrackingOct 16, 2025ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency ConstraintsSep 13, 2024Visual Language Tracking with Multi-modal Interaction: A Robust BenchmarkDec 30, 2025Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video GenerationJan 28, 2026Latent Temporal Discrepancy as Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2VJan 28, 2026Artifact-Aware Evaluation for High-Quality Video GenerationAug 18, 2025Stochastic Self-Guidance for Training-Free Enhancement of Diffusion ModelsMay 28, 2024The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and AttentionSep 19, 2025UNIV: Unified Foundation Model for Infrared and Visible ModalitiesAug 11, 2025Omni-Effects: Unified and Spatially-Controllable Visual Effects GenerationDec 30, 2025Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning