Showing 1–20 of 45 results
/ Date/ Name
Oct 23, 2020Improving Noise Robustness of an End-to-End Neural Model for Automatic Speech RecognitionApr 5, 2021Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech RecognitionJul 22, 2021CarneliNet: Neural Mixture Model for Automatic Speech RecognitionOct 27, 2022A Compact End-to-End Model with Local and Global Context for Spoken Language IdentificationJul 22, 2024Schrödinger Bridge for Generative Speech EnhancementOct 23, 2024VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-TuningJul 13, 2023Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot FillingJun 28, 2024Less is More: Accurate Speech Recognition & Translation without Web-Scale DataAug 23, 2024NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing TasksJul 29, 2024Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language ModelsSep 18, 2023Investigating End-to-End ASR Architectures for Long Form Audio TranscriptionApr 5, 2021SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognitionMay 19, 2025Granary: Speech Recognition and Translation Dataset in 25 European LanguagesOct 18, 2023The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR SystemOct 18, 2023Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data GenerationMar 14, 2024Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast ConformerDec 27, 2023Stateful Conformer with Cache-based Inference for Streaming Automatic Speech RecognitionMay 21, 2025SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language ModelSep 17, 2025Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and ASTFeb 27, 2026Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text