"au:"Xudong Lin"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Xudong Lin"" — arXiv2 Search

Showing 1–20 of 38 results

/ Date/ Name

Jun 5, 2022Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval Jan 28, 2021VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs Jan 26, 2022Learning To Recognize Procedural Activities with Distant Supervision Oct 12, 2019Context-Gated Convolution Oct 24, 2019Towards Train-Test Consistency for Semi-supervised Temporal Action Localization Jan 6, 2023In Defense of Structural Symbolic Representation for Video Event-Relation Prediction Mar 4, 2019Unsupervised Rank-Preserving Hashing for Large-Scale Image Retrieval Jan 11, 2019DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition Dec 2, 2021Video-Text Pre-training with Learned Regions Dec 10, 2019Flow-Distilled IP Two-Stream Networks for Compressed Video Action Recognition Oct 22, 2022Weakly-Supervised Temporal Article Grounding Sep 22, 2024Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses Jan 24, 2025PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction Mar 25, 2023Supervised Masked Knowledge Distillation for Few-Shot Transformers Feb 17, 2025Progress of the TianQin project Jan 24, 2025ENTER: Event Based Interpretable Reasoning for VideoQA May 27, 2023Non-Sequential Graph Script Induction via Multimedia Grounding Jun 19, 2024Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?Jan 10, 2024Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning Apr 7, 2023Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering