Showing 1–18 of 18 results
/ Date/ Name
Jan 26, 2026VIBEVOICE-ASR Technical ReportJun 1, 2025CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow MatchingMay 19, 2025MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their MixJan 29, 2024Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex RecordingsDec 16, 2023SECap: Speech Emotion Captioning with Large Language ModelSep 25, 2023AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech DataAug 14, 2023The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing TrackDec 1, 2022High Fidelity Speech Enhancement with Band-split RNNJul 20, 2022Diffsound: Discrete Diffusion Model for Text-to-sound GenerationMar 29, 2022Integrating Lattice-Free MMI into End-to-End Speech RecognitionMar 28, 2022On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech RecognitionJan 6, 2022Improving Mandarin End-to-End Speech Recognition with Word N-gram Language ModelNov 29, 2021Mixed Precision DNN Qunatization for Overlapped Speech Separation and RecognitionAug 30, 2021ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language UnderstandingNov 16, 2020Audio-visual Multi-channel Integration and Recognition of Overlapped SpeechMay 18, 2020Audio-visual Multi-channel Recognition of Overlapped SpeechJan 6, 2020Audio-visual Recognition of Overlapped speech for the LRS2 datasetNov 8, 2019Adversarial Attacks on GMM i-vector based Speaker Verification Systems