Showing 21–40 of 235 results
/ Date/ Name
Sep 15, 2025Fun-ASR Technical ReportSep 9, 2025VStyle: A Benchmark for Voice Style Adaptation with Spoken InstructionsSep 5, 2025Layer-wise Analysis for Quality of Multilingual Synthesized SpeechAug 10, 2025Incorporating Contextual Paralinguistic Understanding in Large Speech-Language ModelsAug 5, 2025When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign InputsAug 1, 2025AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song GenerationAug 1, 2025Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech RecognitionJul 23, 2025Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your VoiceJul 20, 2025DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech SynthesisJul 17, 2025VoxtralJun 24, 2025Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio GenerationJun 23, 2025Infant Cry Emotion Recognition Using Improved ECAPA-TDNN with Multiscale Feature Fusion and Attention EnhancementJun 1, 2025CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow MatchingMay 23, 2025Source Separation of Small Classical Ensembles: Challenges and OpportunitiesMay 20, 2025FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset GenerationMay 19, 2025Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASRMay 19, 2025MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their MixMar 17, 2025Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio SemanticsFeb 21, 2025Retrieval-Augmented Speech Recognition Approach for Domain ChallengesNov 27, 2024Music2Fail: Transfer Music to Failed Recorder Style