eess.AS — arXiv2

Mar 31, 2022PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations

Mar 30, 2022Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Mar 30, 2022Multiple Narrow-band signals Direction Finding with TMLA by Nonuniform Period Modulation

Mar 29, 2022Integrating Lattice-Free MMI into End-to-End Speech Recognition

Mar 28, 2022On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition

Mar 25, 2022DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning

Mar 25, 2022Automatic Song Translation for Tonal Languages

Mar 13, 2022CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification

Mar 7, 2022Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

Feb 15, 2022General-purpose, long-context autoregressive modeling with Perceiver AR

Feb 8, 2022Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

Jan 6, 2022Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Nov 29, 2021Mixed Precision DNN Qunatization for Overlapped Speech Separation and Recognition

Nov 28, 2021Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

Oct 27, 2021LSTM-RPA: A Simple but Effective Long Sequence Prediction Algorithm for Music Popularity Prediction

Oct 18, 2021Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Oct 17, 2021VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

Oct 14, 2021DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances

Oct 14, 2021Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech

Oct 14, 2021Revisiting IPA-based Cross-lingual Text-to-speech