Showing 1–20 of 28 results
/ Date/ Name
Mar 29, 2023AutoAD: Movie Description in ContextOct 10, 2023AutoAD II: The Sequel -- Who, When, and What in Movie Audio DescriptionOct 12, 2022Prompt Generation Networks for Input-Space Adaptation of Frozen Vision TransformersSep 19, 2017Human Action Forecasting by Learning Task GrammarsJul 17, 2023Learning to Count without AnnotationsApr 2, 2025Learning from Streaming Video with Orthogonal GradientsDec 3, 2025Unique Lives, Shared World: Learning from Single-Life VideosOct 10, 2022Turbo Training with Token DropoutApr 1, 2024Stale Diffusion: Hyper-realistic 5D Movie Generation Using Old-school MethodsAug 3, 2020Memory-augmented Dense Predictive Coding for Video Representation LearningNov 4, 2025Dynamic Reflections: Probing Video Representations with Text AlignmentApr 22, 2024AutoAD III: The Prequel -- Back to the PixelsSep 10, 2019Video Representation Learning by Dense Predictive CodingOct 19, 2020Self-supervised Co-training for Video Representation LearningApr 6, 2022Temporal Alignment Networks for Long-term VideoJul 24, 2017Human Pose Forecasting via Deep Markov ModelsApr 29, 2022Flamingo: a Visual Language Model for Few-Shot LearningDec 21, 2023Multi-Sentence Grounding for Long-term Instructional VideoJul 22, 2024AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio DescriptionJul 5, 2024CountGD: Multi-Modal Open-World Counting