cs.SD — arXiv2

Oct 27, 2021LSTM-RPA: A Simple but Effective Long Sequence Prediction Algorithm for Music Popularity Prediction

Oct 18, 2021Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Oct 17, 2021VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

Oct 14, 2021DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances

Oct 14, 2021Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech

Oct 14, 2021Revisiting IPA-based Cross-lingual Text-to-speech

Oct 13, 2021Singer separation for karaoke content generation

Oct 7, 2021Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

Oct 1, 2021Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

Sep 20, 2021TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method

Sep 18, 2021SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

Sep 16, 2021PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

Sep 9, 2021BeamTransformer: Microphone Array-based Overlapping Speech Detection

Aug 30, 2021ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding

Jul 27, 2021The CORSMAL benchmark for the prediction of the properties of containers

Jul 21, 2021StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Jul 20, 2021A Real-time Speaker Diarization System Based on Spatial Spectrum

Jul 14, 2021FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

Jul 14, 2021Multi-Task Audio Source Separation

Apr 26, 2021Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction