Showing 1–15 of 15 results
/ Date/ Name
Apr 2, 2024Effective internal language model training and fusion for factorized transducer modelJul 11, 2025SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignmentOct 8, 2025Can Speech LLMs Think while Listening?Oct 27, 2024Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech GenerationOct 7, 2021Streaming Transformer Transducer Based Speech Recognition Using Non-Causal ConvolutionDec 21, 2024Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech RecognitionFeb 6, 2026Scaling Speech Tokenizers with Diffusion AutoencodersJul 21, 2023Prompting Large Language Models with Speech Recognition AbilitiesNov 12, 2023AudioChatLlama: Towards General-Purpose Speech Abilities for LLMsOct 25, 2022Dynamic Speech Endpoint Detection with Regression TargetsMay 21, 2023Multi-Head State Space Model for Speech RecognitionSep 5, 2023TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR ModelsOct 7, 2021Transferring Voice Knowledge for Acoustic Event Detection: An Empirical StudyOct 21, 2020Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech RecognitionJul 9, 2021On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models