Showing 1–11 of 11 results
/ Date/ Name
Mar 25, 2024DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric DiffusionJun 3, 2025IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video GenerationNov 28, 2023Text-Driven Image Editing via Learnable RegionsNov 21, 2022SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-trainingAug 19, 2021Self-Supervised Video Representation Learning with Meta-Contrastive NetworkJun 2, 2022REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question AnsweringJul 5, 2024Rethinking Visual Prompting for Multimodal Large Language Models with External KnowledgeDec 12, 2024Olympus: A Universal Task Router for Computer Vision TasksNov 10, 2025SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial RewardsMar 16, 2022Pseudo-Q: Generating Pseudo Language Queries for Visual GroundingDec 28, 2021AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition