Showing 1–20 of 96 results
/ Date/ Name
Dec 7, 2018TDAN: Temporally Deformable Alignment Network for Video Super-ResolutionDec 21, 2019Deep Audio PriorApr 5, 2021Cyclic Co-Learning of Sounding Object Visual Grounding and Sound SeparationJul 21, 2020Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video ParsingDec 7, 2018An Attempt towards Interpretable Audio-Visual Video CaptioningApr 5, 2021Can audio-visual integration strengthen robustness under multimodal attacks?Mar 23, 2018Audio-Visual Event Localization in Unconstrained VideosNov 10, 2021Space-Time Memory Network for Sounding Object Localization in VideosFeb 4, 2023AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene SynthesisAug 26, 2023DiffI2I: Efficient Diffusion Model for Image-to-Image TranslationMay 31, 2023Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQAMar 29, 2023Audio-Visual Grouping Network for Sound Localization from MixturesMay 3, 2023AV-SAM: Segment Anything Model Meets Audio-Visual Localization and SegmentationMar 22, 2024Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior RecognitionJul 5, 2023Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRINov 7, 2024SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question AnsweringNov 5, 2024Continual Audio-Visual Sound SeparationFeb 1, 2025Do Audio-Visual Segmentation Models Truly Segment Sounding Objects?Jul 15, 2025AROMA: Mixed-Initiative AI Assistance for Non-Visual Cooking by Grounding Multi-modal Information Between Reality and VideosFeb 11, 2025PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization