Showing 1–20 of 51 results
/ Date/ Name
Oct 23, 2019Relation Modeling with Graph Convolutional Networks for Facial Action Unit DetectionNov 2, 2022Monolingual Recognizers Fusion for Code-switching Speech RecognitionJan 5, 2025Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging ModuleAug 11, 2024VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech ProcessingSep 28, 2025LORT: Locally Refined Convolution and Taylor Transformer for Monaural Speech EnhancementNov 19, 2020Multi-stage Speaker Extraction with Utterance and Frame-Level Reference SignalsApr 17, 2021Exploring Deep Learning for Joint Audio-Visual Lip BiometricsFeb 21, 2022L-SpEx: Localized Target Speaker ExtractionApr 30, 2022Heterogeneous Graph Neural Networks using Self-supervised Reciprocally Contrastive LearningJul 15, 2022MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound SourcesOct 9, 2022VCSE: Time-Domain Visual-Contextual Speaker Extraction NetworkAug 31, 2024Progressive Residual Extraction based Pre-training for Speech Representation LearningSep 29, 2025Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech SynthesisAug 4, 2025SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech CodecJan 24, 2025Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-TuningSep 27, 2024Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTSFeb 16, 2026Breaking Data Efficiency Dilemma: A Federated and Augmented Learning Framework For Alzheimer's Disease Detection via SpeechDec 18, 2023A Refining Underlying Information Framework for Monaural Speech EnhancementDec 7, 2022MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech SeparationMay 18, 2023Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation