Showing 1–12 of 12 results
/ Date/ Name
Sep 16, 2024Self-supervised Speech Models for Word-Level Stuttered Speech DetectionNov 11, 2025Unifying Model and Layer Fusion for Speech Foundation ModelsOct 3, 2022SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelNov 7, 2021Theme Transformer: Symbolic Music Generation with Theme-Conditioned TransformerOct 8, 2025Can Speech LLMs Think while Listening?Jun 18, 2024Interface Design for Self-Supervised Speech ModelsFeb 8, 2024Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech ModelSep 19, 2023AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation ModelsNov 2, 2022M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image RetrievalFeb 10, 2024SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image dataSep 18, 2024Measuring Sound Symbolism in Audio-visual ModelsNov 8, 2024Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks