cs.SD — arXiv2

Apr 5, 2021AST: Audio Spectrogram Transformer

Apr 2, 2021Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

Mar 28, 2021Quantifying Bias in Automatic Speech Recognition

Mar 6, 2021Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

Mar 3, 2021Multi-view Audio and Music Classification

Jan 19, 2021UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Jan 9, 2021Coupling a generative model with a discriminative learning framework for speaker verification

Dec 24, 2020Unsupervised neural adaptation model based on optimal transport for spoken language identification

Dec 17, 2020The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks

Nov 3, 2020Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

Oct 22, 2020Similarity Analysis of Self-Supervised Speech Representations

Oct 18, 2020Self-Attention Generative Adversarial Network for Speech Enhancement

Aug 22, 2020A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Jul 29, 2020Exploiting Cross-Lingual Knowledge in Unsupervised Acoustic Modeling for Low-Resource Languages

Jul 25, 2020Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

Jul 25, 2020Adaptive music: Automated music composition and distribution

May 18, 2020Audio-visual Multi-channel Recognition of Overlapped Speech

May 15, 2020WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

May 14, 2020Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario

Apr 30, 2020Jukebox: A Generative Model for Music