Showing 1–20 of 21 results
/ Date/ Name
Aug 9, 2021Image Retrieval on Real-life Images with Pre-trained Vision-and-Language ModelsDec 19, 2021LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling ApproachNov 29, 2023Zero-shot Retrieval: Augmenting Pre-trained Models with Search EnginesMay 27, 2024Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone EnsemblingApr 6, 2020Sub-Instruction Aware Vision-and-Language NavigationDec 22, 2023Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and VariancesNov 26, 2020A Recurrent Vision-and-Language BERT for NavigationAug 20, 2019Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided AttentionJul 3, 2024Knowledge Composition using Task Vectors with Learned Anisotropic ScalingOct 13, 2020DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in VideoJul 1, 2020The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and PoseOct 19, 2020Language and Visual Entity Relationship Graph for Agent NavigationMay 27, 2020A Multi-modal Approach to Fine-grained Opinion Mining on Video ReviewsFeb 3, 2025RandLoRA: Full-rank parameter-efficient fine-tuning of large modelsDec 27, 2025The Quest for Winning Tickets in Low-Rank AdaptersOct 19, 2025An empirical study of the effect of video encoders on Temporal Video GroundingMar 17, 2025Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video PredictionAug 1, 2018Action Anticipation By Predicting Future Dynamic ImagesOct 24, 2019Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods ComparisonJul 16, 2024Temporally Grounding Instructional Diagrams in Unconstrained Videos