Showing 1–15 of 15 results
/ Date/ Name
May 25, 2023HAAV: Hierarchical Aggregation of Augmented Views for Image CaptioningMay 9, 2022Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningNov 16, 2018Data-Efficient Graph Embedding Learning for PCB Component DetectionNov 20, 2022Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language NavigationJul 16, 2020FeatMatch: Feature-Based Augmentation for Semi-Supervised LearningJun 12, 2019Manifold Graph with Learned Prototypes for Semi-Supervised Image ClassificationFeb 18, 2021Unbiased Teacher for Semi-Supervised Object DetectionMay 9, 2024CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-ExpertsJun 15, 2024Beyond Raw Videos: Understanding Edited Videos with Large Multimodal ModelMay 17, 2023CLIP-GCD: Simple Language Guided Generalized Category DiscoveryNov 24, 2025Vidi2.5: Large Multimodal Models for Video Understanding and CreationFeb 4, 2025D-Attn: Decomposed Attention for Large Vision-and-Language ModelsMar 21, 2020Who2com: Collaborative Perception via Learnable Handshake CommunicationApr 22, 2025Vidi: Large Multimodal Models for Video Understanding and EditingMar 18, 2025Where do Large Vision-Language Models Look at when Answering Questions?