Showing 1–20 of 88 results
/ Date/ Name
Jun 2, 2019Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal DataAug 11, 2020Robust Long-Term Object Tracking via Improved Discriminative Model PredictionAug 31, 2017Video Captioning with Guidance of Multimodal Latent TopicsOct 22, 2024Emphasizing Discriminative Features for Dataset Distillation in Complex ScenariosAug 18, 2024Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmonyJul 30, 2020From A Glance to "Gotcha": Interactive Facial Image Retrieval with Progressive Relevance FeedbackJun 5, 2018Focal Visual-Text Attention for Visual Question AnsweringDec 4, 2020Spatial-Temporal Alignment Network for Action Recognition and DetectionOct 6, 2020Support-set bottlenecks for video-text representation learningSep 16, 2019Learning Spatial Awareness to Improve Crowd CountingNov 29, 2018Perceiving Physical Equation by Observing Visual ScenariosJul 16, 2016Exploiting Multi-modal Curriculum in Noisy Web Data for Large-scale Concept LearningJun 17, 2024Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction TuningJul 17, 2024Multimodal Reranking for Knowledge-Intensive Visual Question AnsweringJun 25, 2025Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot RecordingsApr 4, 2020SimAug: Learning Robust Representations from Simulation for Trajectory PredictionNov 2, 2020Event-Related Bias Removal for Real-time Disaster EventsDec 13, 2019The Garden of Forking Paths: Towards Multi-Future Trajectory PredictionMay 26, 2019Technical Report of the Video Event Reconstruction and Analysis (VERA) System -- Shooter Localization, Models, Interface, and BeyondNov 29, 2018Traffic Danger Recognition With Surveillance Cameras Without Training Data