Showing 1–20 of 31 results
/ Date/ Name
Dec 12, 2023MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active PerceptionMar 20, 2025RoboFactory: Exploring Embodied Agent Collaboration with Compositional ConstraintsSep 13, 2023SupFusion: Supervised LiDAR-Camera Fusion for 3D Object DetectionOct 23, 2024WorldSimBench: Towards Video Generation Models as World SimulatorsFeb 19, 2025NavigateDiff: Visual Predictors are Zero-Shot Navigation AssistantsMar 18, 2024MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World ControlFeb 7, 2024Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and CalibrationMar 21, 2025Position: Interactive Generative Video as Next-Generation Game EngineJan 14, 2025GameFactory: Creating New Games with Generative Interactive VideosMar 9, 2026Reading $\neq$ Seeing: Diagnosing and Closing the Typography Gap in Vision-Language ModelsJan 28, 2026TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch GuidanceJan 22, 2025T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image GenerationOct 9, 2025BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied CapabilitiesApr 13, 2026ComSim: Building Scalable Real-World Robot Data Generation via Compositional SimulationNov 3, 2025LiveSearchBench: An Automatically Constructed Benchmark for Retrieval and Reasoning over Dynamic KnowledgeNov 2, 2025GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion PoliciesMar 4, 2025ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning TasksApr 7, 2026CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional EnvironmentAug 21, 2024Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language ModelsJan 20, 2026Toward Efficient Agents: Memory, Tool learning, and Planning