Showing 1–20 of 21 results
/ Date/ Name
Nov 2, 2020CVC: Contrastive Learning for Non-parallel Voice ConversionMay 19, 2020Atss-Net: Target Speaker Separation via Attention-based Neural NetworkJun 4, 2025Sounding that Object: Interactive Object-Aware Image to Audio GenerationMay 10, 2022Learning Visual Styles from Audio-Visual AssociationsSep 12, 2019Sams-Net: A Sliced Attention-based Neural Network for Music Source SeparationSep 22, 2024Self-Supervised Audio-Visual Soundscape StylizationJul 3, 2025The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative AudioJul 5, 2023Deep Speech Synthesis from MRI-Based Articulatory RepresentationsFeb 11, 2026Conversational Behavior Modeling Foundation Model With Multi-Level PerceptionOct 15, 2021Neural Dubber: Dubbing for Videos According to ScriptsJan 21, 2025Audio Texture Manipulation by Exemplar-Based AnalogyOct 8, 2025AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual CuesDec 20, 2023Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and DetectionMay 2, 2023On Uni-Modal Feature Learning in Supervised Multi-Modal LearningJun 22, 2022Radio2Speech: High Quality Speech Recovery from Radio Frequency SignalsJun 21, 2021Improving Multi-Modal Learning with Uni-Modal TeachersDec 25, 2025Enabling Conversational Behavior Reasoning Capabilities in Full-Duplex SpeechDec 14, 2025Schrodinger Audio-Visual Editor: Object-Level Audiovisual RemovalMar 6, 2025Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking CapabilitiesAug 25, 2025EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems