Showing 1–20 of 24 results
/ Date/ Name
Apr 12, 2026CodaRAG: Connecting the Dots with Associativity Inspired by Complementary LearningApr 6, 2026Joint Fullband-Subband Modeling for High-Resolution SingFake DetectionNov 11, 2024Building a Taiwanese Mandarin Spoken Language Model: A First AttemptSep 13, 2024DFADD: The Diffusion and Flow-Matching Based Audio Deepfake DatasetJun 7, 2024Neural Codec-based Adversarial Sample Detection for Speaker VerificationFeb 20, 2024EMO-SUPERB: An In-depth Look at Speech Emotion RecognitionOct 4, 2023Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech ModelSep 29, 2023Low-Resource Self-Supervised Learning with SSL-Enhanced TTSOct 27, 2022Multimodal Transformer Distillation for Audio-Visual SynchronizationOct 3, 2022Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker DetectionMay 8, 2022Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker InformationApr 25, 2022Parallel Synthesis for Autoregressive Speech GenerationApr 1, 2022Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech SynthesisMar 6, 2021Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-SpeechMay 15, 2020WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPUDec 5, 2019Towards Robust Neural Vocoding for Speech Generation: A SurveyMay 28, 2019Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice ConversionOct 30, 2018Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text DataJul 21, 2018Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content RetrievalMar 29, 2018Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only