Showing 1–18 of 18 results
/ Date/ Name
Apr 10, 2024PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion ScoresJun 17, 2021Multi-mode Transformer Transducer with Stochastic Future ContextDec 29, 2017The CAPIO 2017 Conversational Speech Recognition SystemMay 14, 2024SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language ModelsApr 25, 2026Robust Audio-Text Retrieval via Cross-Modal Attention and Hybrid LossApr 13, 2020Speaker Diarization with Lexical InformationOct 1, 2019State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D ConvolutionsNov 19, 2021SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural SpeechJan 24, 2021A Review of Speaker Diarization: Recent Advances with Deep LearningDec 14, 2021On the Use of External Data for Spoken Named Entity RecognitionSep 30, 2022E-Branchformer: Branchformer with Enhanced merging for speech recognitionFeb 12, 2025Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary EvaluationMay 14, 2024SpeechVerse: A Large-scale Generalizable Audio Language ModelMay 21, 2020Multistream CNN for Robust Acoustic ModelingJun 11, 2021Leveraging Pre-trained Language Model for Speech Sentiment AnalysisDec 24, 2024Zero-resource Speech Translation and Recognition with LLMsMar 5, 2020Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum EigengapMay 21, 2020ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition