Showing 61–80 of 223 results
/ Date/ Name
Jun 13, 2024Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of TimeJun 12, 2024LAFMA: A Latent Flow Matching Model for Text-to-Audio GenerationJun 9, 2024Zero-Shot End-To-End Spoken Question Answering In Medical DomainJun 7, 2024Neural Codec-based Adversarial Sample Detection for Speaker VerificationMay 23, 2024Visual Echoes: A Simple Unified Transformer for Audio-Visual GenerationMar 31, 2024WavLLM: Towards Robust and Adaptive Speech Large Language ModelMar 18, 2024QEAN: Quaternion-Enhanced Attention Network for Visual Dance GenerationFeb 20, 2024EMO-SUPERB: An In-depth Look at Speech Emotion RecognitionFeb 15, 2024MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of MusicJan 29, 2024Continuous Target Speech Extraction: Enhancing Personalized Diarization and Extraction on Complex RecordingsDec 30, 2023Boosting Large Language Model for Speech Synthesis: An Empirical StudyDec 20, 2023FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised LearningDec 20, 2023Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech RecognitionDec 16, 2023SECap: Speech Emotion Captioning with Large Language ModelDec 8, 2023Seamless: Multilingual Expressive and Streaming Speech TranslationNov 24, 2023Overview Of The 2023 Icassp Sp Clarity Challenge: Speech Enhancement For Hearing AidsNov 21, 2023Adapting pretrained speech model for Mandarin lyrics transcription and alignmentOct 28, 2023Audio-Visual Instance SegmentationOct 12, 2023CompA: Addressing the Gap in Compositional Reasoning in Audio-Language ModelsSep 29, 2023Low-Resource Self-Supervised Learning with SSL-Enhanced TTS