Showing 81–100 of 223 results
/ Date/ Name
Sep 28, 2023Hierarchical Cross-Modality Knowledge Transfer with Sinkhorn Attention for CTC-based ASRSep 25, 2023AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech DataSep 19, 2023MelodyGLM: Multi-task Pre-training for Symbolic Melody GenerationSep 18, 2023RECAP: Retrieval-Augmented Audio CaptioningAug 14, 2023The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing TrackAug 5, 2023Self-Distillation Prototypes Network: Learning Robust Speaker Representations without SupervisionJul 18, 2023SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANsJun 27, 20233D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation DisentanglementJun 18, 2023MARBLE: Music Audio Representation Benchmark for Universal EvaluationJun 13, 2023StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsMay 26, 2023Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic ModelMay 24, 2023ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text TranslationMay 19, 2023Language-universal phonetic encoder for low-resource speech recognitionMay 19, 2023Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech RecognitionMay 14, 2023REMAST: Real-time Emotion-based Music Arrangement with Soft TransitionApr 20, 2023Using Mobile Data and Deep Models to Assess Auditory Verbal HallucinationsMar 14, 2023CAT: Causal Audio Transformer for Audio ClassificationFeb 16, 2023Personalized Audio Quality Preference PredictionJan 20, 2023Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme PredictionsDec 29, 2022StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models