"au:"Chenpeng Du"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Chenpeng Du"" — arXiv2 Search

Showing 1–20 of 30 results

/ Date/ Name

Jun 13, 2023UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding Jan 25, 2024VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech Feb 1, 2021Rich Prosody Diversity Modelling with Phone-level Mixture Density Network May 27, 2021Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech Synthesis Mar 30, 2023DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder Apr 25, 2023Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge Nov 4, 2020Data Augmentation for End-to-end Code-switching Speech Recognition Apr 2, 2022VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature Jun 25, 2023DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech Oct 23, 2023Acoustic BPE for Speech Generation with Discrete Tokens Sep 10, 2023VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching May 31, 2025MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation Oct 14, 2025DiSTAR: Diffusion over a Scalable Token Autoregressive Representation for Speech Generation Feb 15, 2022Unsupervised word-level prosody tagging for controllable speech synthesis Nov 17, 2022EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance May 6, 2024AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding Sep 14, 2023Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS May 25, 2025Towards Reliable Large Audio Language Model Sep 3, 2024vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders Feb 6, 2025DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation