Showing 1–20 of 30 results
/ Date/ Name
Jun 13, 2023UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and VocodingJan 25, 2024VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-SpeechFeb 1, 2021Rich Prosody Diversity Modelling with Phone-level Mixture Density NetworkMay 27, 2021Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech SynthesisMar 30, 2023DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion AutoencoderApr 25, 2023Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 ChallengeNov 4, 2020Data Augmentation for End-to-end Code-switching Speech RecognitionApr 2, 2022VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic FeatureJun 25, 2023DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-SpeechOct 23, 2023Acoustic BPE for Speech Generation with Discrete TokensSep 10, 2023VoiceFlow: Efficient Text-to-Speech with Rectified Flow MatchingMay 31, 2025MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and GenerationOct 14, 2025DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech GenerationFeb 15, 2022Unsupervised word-level prosody tagging for controllable speech synthesisNov 17, 2022EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label GuidanceMay 6, 2024AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion EncodingSep 14, 2023Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTSMay 25, 2025Towards Reliable Large Audio Language ModelSep 3, 2024vec2wav 2.0: Advancing Voice Conversion via Discrete Token VocodersFeb 6, 2025DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation