Showing 1–20 of 223 results
/ Date/ Name
Apr 24, 2026UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text InstructionsApr 23, 2026Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech RecognitionApr 22, 2026Materialistic RIR: Material Conditioned Realistic RIR GenerationApr 22, 2026SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech GenerationApr 22, 2026ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music IntelligenceApr 22, 2026From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMRApr 21, 2026HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language ModelsApr 17, 2026Benign Fine-Tuning Breaks Safety Alignment in Audio LLMsApr 13, 2026ActorMind: Emulating Human Actor Reasoning for Speech Role-PlayingApr 13, 2026Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and MusicApr 6, 2026Joint Fullband-Subband Modeling for High-Resolution SingFake DetectionMar 30, 2026MOSS-VoiceGenerator: Create Realistic Voices with Natural Language DescriptionsMar 20, 2026MOSS-TTSD: Text to Spoken Dialogue GenerationMar 18, 2026MOSS-TTS Technical ReportJan 29, 2026Qwen3-ASR Technical ReportJan 26, 2026VIBEVOICE-ASR Technical ReportJan 22, 2026Qwen3-TTS Technical ReportJan 14, 2026Towards Realistic Synthetic Data for Automatic Drum TranscriptionJan 12, 2026Elastic overtones: an equal temperament 12 tone music system with "perfect" fifthsJan 7, 2026Apollo: Unified Multi-Task Audio-Video Joint Generation