Showing 41–60 of 223 results
/ Date/ Name
Mar 17, 2025Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio SemanticsFeb 21, 2025Retrieval-Augmented Speech Recognition Approach for Domain ChallengesNov 27, 2024Music2Fail: Transfer Music to Failed Recorder StyleNov 11, 2024Building a Taiwanese Mandarin Spoken Language Model: A First AttemptNov 6, 2024Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way ForwardNov 1, 2024Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLMOct 25, 2024GPT-4o System CardOct 17, 2024STCON System for the CHiME-8 ChallengeOct 7, 2024Demo of Zero-Shot Guitar Amplifier Modelling: Enhancing Modeling with Hyper Neural NetworksSep 20, 2024Unifying Global and Near-Context Biasing in a Single Trie PassSep 20, 2024Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with WhisperSep 16, 2024StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style DiffusionSep 13, 2024DFADD: The Diffusion and Flow-Matching Based Audio Deepfake DatasetSep 8, 2024The first Cadenza challenges: using machine learning competitions to improve music for listeners with a hearing lossJul 15, 2024Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding controlJul 5, 2024Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech RecognitionJul 5, 2024TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASRJul 3, 2024MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song GenerationJun 30, 2024Improving Real-Time Music Accompaniment Separation with MMDenseNetJun 25, 2024Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet