Showing 1–20 of 33 results
/ Date/ Name
Aug 28, 2023TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech ModelsFeb 14, 2024MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-SpeechDec 18, 2024Speech Watermarking with Discrete Intermediate RepresentationsNov 15, 2024WavChat: A Survey of Spoken Dialogue ModelsFeb 19, 2024Language-Codec: Bridging Discrete Codec Representations and Speech Language ModelsMay 14, 2025WavReward: Spoken Dialogue Models With Generalist Reward EvaluatorsAug 29, 2024WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language ModelingJun 3, 2024ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style ControlAug 11, 2025TAP: Parameter-efficient Task-Aware Prompting for Adverse Weather RemovalApr 16, 2026Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue ModelsJan 2, 2025OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse ScenariosJul 14, 2023Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech SynthesisFeb 20, 2025WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue ModelsMar 8, 2024Enhancing Multimodal Unified Representations for Cross Modal GeneralizationMay 15, 2025T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI FeedbackAug 30, 2025Entropy-based Coarse and Compressed Semantic Speech Representation LearningApr 16, 2026WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-TrainingOct 16, 2024MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic SynchronizationFeb 26, 2025MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech SynthesisJul 20, 2025Open-set Cross Modal Generalization via Multimodal Unified Representation