Showing 1–20 of 26 results
/ Date/ Name
May 16, 2020Streaming Transformer-based Acoustic Models Using Self-attention with Augmented MemoryNov 3, 2020Streaming Attention-Based Models with Augmented Memory for End-to-End Speech RecognitionNov 2, 2023FLAP: Fast Language-Audio Pre-trainingDec 14, 2022Efficient Speech Representation Learning with Low-Bit QuantizationMay 18, 2020Weak-Attention Suppression For Transformer Based Speech RecognitionOct 28, 2019Transformer-Transducer: End-to-End Speech Recognition with Self-AttentionOct 21, 2020Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech RecognitionMay 16, 2024Chameleon: Mixed-Modal Early-Fusion Foundation ModelsNov 9, 2020Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASRMay 14, 2017Novel CMOS RFIC Layout Generation with Concurrent Device Placement and Fixed-Length Microstrip RoutingOct 22, 2024Altogether: Image Captioning via Re-aligning Alt-textJun 7, 2018Training Augmentation with Adversarial Examples for Robust Speech RecognitionNov 5, 2019RNN-T For Latency Controlled ASR With Improved Beam SearchNov 5, 2023Attention or Convolution: Transformer Encoders in Audio Language Models for Inference EfficiencyNov 27, 2019AIPNet: Generative Adversarial Pre-training of Accent-invariant Networks for End-to-end Speech RecognitionOct 16, 2022SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningJul 29, 2025Meta CLIP 2: A Worldwide Scaling RecipeSep 29, 2025DepthLM: Metric Depth From Vision Language ModelsDec 2, 2022Continual Learning for On-Device Speech Recognition using Disentangled ConformersJun 7, 2018Domain Adversarial Training for Accented Speech Recognition