Showing 121–140 of 235 results
/ Date/ Name
Oct 27, 2022Multimodal Transformer Distillation for Audio-Visual SynchronizationOct 18, 2022Simple and Effective Unsupervised Speech TranslationOct 3, 2022Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker DetectionAug 28, 2022Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer TasksAug 16, 2022Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech RecognitionJul 29, 2022Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognitionJul 20, 2022Diffsound: Discrete Diffusion Model for Text-to-sound GenerationJun 7, 2022LegoNN: Building Modular Encoder-Decoder ModelsJun 3, 2022Constraining Gaussian processes for physics-informed acoustic emission mappingMay 30, 2022StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech SynthesisMay 16, 2022PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker VerificationMay 8, 2022Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker InformationMay 6, 2022Vocalsound: A Dataset for Improving Human Vocal Sounds RecognitionMay 3, 2022i-Code: An Integrative and Composable Multimodal Learning FrameworkApr 26, 2022Reformulating Speaker Diarization as Community Detection With Emphasis On Topological StructureApr 25, 2022Parallel Synthesis for Autoregressive Speech GenerationApr 25, 2022Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting DataApr 22, 2022Speaking-Rate-Controllable HiFi-GAN Using Feature InterpolationApr 8, 2022Transducer-based language embedding for spoken language identificationApr 1, 2022Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis