"au:"Jitesh Jain"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Jitesh Jain"" — arXiv2 Search

Showing 1–14 of 14 results

/ Date/ Name

Dec 23, 2021SeMask: Semantically Masked Transformers for Semantic Segmentation Dec 15, 2025SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Nov 10, 2022OneFormer: One Transformer to Rule Universal Image Segmentation Aug 5, 2022Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand Dec 21, 2023VCoder: Versatile Vision Encoders for Multimodal Large Language Models Oct 17, 2025AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory Dec 12, 2024Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation Mar 27, 2024Benchmarking Object Detectors with COCO: A New Path Forward May 9, 2024CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Apr 2, 2025Slow-Fast Architecture for Video Multi-Modal Large Language Models Sep 19, 2020DEAP Cache: Deep Eviction Admission and Prefetching for Cache Jun 8, 2023Matting Anything May 7, 2025Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait Jan 15, 2026Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding