eess.AS — arXiv2

Oct 9, 2023The First Cadenza Signal Processing Challenge: Improving Music for Those With a Hearing Loss

Oct 5, 2023The ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids

Oct 4, 2023Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model

Sep 29, 2023Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

Sep 28, 2023Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASR

Sep 25, 2023AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

Sep 19, 2023MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

Sep 18, 2023RECAP: Retrieval-Augmented Audio Captioning

Aug 14, 2023The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

Aug 5, 2023Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Jul 18, 2023SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

Jun 27, 20233D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Jun 23, 2023Implementing contextual biasing in GPU decoder for online ASR

Jun 18, 2023MARBLE: Music Audio Representation Benchmark for Universal Evaluation

Jun 13, 2023StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

May 26, 2023Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model

May 24, 2023ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

May 21, 2023i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

May 19, 2023Language-universal phonetic encoder for low-resource speech recognition

May 19, 2023Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition