eess.AS — arXiv2

Oct 13, 2021Singer separation for karaoke content generation

Oct 7, 2021Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

Oct 1, 2021Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

Sep 20, 2021TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method

Sep 18, 2021SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

Sep 16, 2021PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

Sep 9, 2021BeamTransformer: Microphone Array-based Overlapping Speech Detection

Aug 30, 2021ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding

Jul 27, 2021The CORSMAL benchmark for the prediction of the properties of containers

Jul 21, 2021StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Jul 20, 2021A Real-time Speaker Diarization System Based on Spatial Spectrum

Jul 14, 2021FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

Jul 14, 2021Multi-Task Audio Source Separation

Apr 26, 2021Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

Apr 6, 2021LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring

Apr 2, 2021Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

Mar 28, 2021Quantifying Bias in Automatic Speech Recognition

Mar 6, 2021Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

Mar 3, 2021Multi-view Audio and Music Classification

Jan 19, 2021UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data