Showing 1–14 of 14 results
/ Date/ Name
Dec 23, 2021SeMask: Semantically Masked Transformers for Semantic SegmentationDec 15, 2025SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement LearningNov 10, 2022OneFormer: One Transformer to Rule Universal Image SegmentationAug 5, 2022Keys to Better Image Inpainting: Structure and Texture Go Hand in HandDec 21, 2023VCoder: Versatile Vision Encoders for Multimodal Large Language ModelsOct 17, 2025AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User MemoryDec 12, 2024Elevating Visual Perception in Multimodal LLMs with Visual Embedding DistillationMar 27, 2024Benchmarking Object Detectors with COCO: A New Path ForwardMay 9, 2024CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-ExpertsApr 2, 2025Slow-Fast Architecture for Video Multi-Modal Large Language ModelsSep 19, 2020DEAP Cache: Deep Eviction Admission and Prefetching for CacheJun 8, 2023Matting AnythingMay 7, 2025Person Recognition at Altitude and Range: Fusion of Face, Body Shape and GaitJan 15, 2026Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding