Showing 1–20 of 33 results
/ Date/ Name
Jul 30, 2024MMTrail: A Multimodal Trailer Video Dataset with Language and Music DescriptionsOct 20, 2024EVA: An Embodied World Model for Future Video AnticipationJun 23, 2025MinD: Learning A Dual-System World Model for Real-Time Planning and Implicit Risk AnalysisNov 29, 2023M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image GenerationJan 26, 2026TC-IDM: Grounding Video Generation for Executable Zero-shot Robot MotionSep 26, 2025WoW: Towards a World omniscient World model Through Embodied InteractionNov 30, 2022BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object DetectionFeb 29, 2024DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic EnvironmentsDec 2, 2022BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention NetworksMar 27, 2023Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical AggregationDec 23, 2024Large Motion Video Autoencoding with Cross-modal Video VAEMar 26, 2025MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot ManipulationJun 26, 2025SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied AgentsAug 29, 2025ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D TrajectoryMar 30, 2026Key-Embedded Privacy for Decentralized AI in Biomedical OmicsApr 28, 2026ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-EvolutionMay 27, 2021Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch NormalizationFeb 25, 2024ChatMusician: Understanding and Generating Music Intrinsically with LLMApr 23, 2025ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual GuidanceJan 22, 2026PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models