Showing 1–20 of 20 results
/ Date/ Name
Oct 14, 2021SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language ProcessingMar 19, 2025Solla: Towards a Speech-Oriented LLM That Hears Acoustic ContextSep 19, 2023USED: Universal Speaker Extraction and DiarizationOct 11, 2021Multi-View Self-Attention Based Transformer for Speaker RecognitionOct 8, 2022CoBERT: Self-Supervised Speech Representation Learning Through Code Representation LearningMar 29, 2022LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERTMar 31, 2022Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech DataJun 19, 2024SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsDec 26, 2023The NUS-HLT System for ICASSP2024 ICMC-ASR Grand ChallengeFeb 6, 2026Scaling Speech Tokenizers with Diffusion AutoencodersJun 12, 2022The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared TaskJul 3, 2024SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture SpeechNov 3, 2025Leveraging Language Information for Target Language ExtractionOct 7, 2022SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-trainingOct 30, 2022token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and TextJul 19, 2023Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer EncoderFeb 24, 2024Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial NetworksJan 26, 2025Overview of the Amphion Toolkit (v0.2)Oct 26, 2025EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language ModelsSep 10, 2025Audio Deepfake Verification