"au:"Yi-Jen Shih"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yi-Jen Shih"" — arXiv2 Search

Showing 1–12 of 12 results

/ Date/ Name

Sep 16, 2024Self-supervised Speech Models for Word-Level Stuttered Speech Detection Nov 11, 2025Unifying Model and Layer Fusion for Speech Foundation Models Oct 3, 2022SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model Nov 7, 2021Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer Oct 8, 2025Can Speech LLMs Think while Listening?Jun 18, 2024Interface Design for Self-Supervised Speech Models Feb 8, 2024Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model Sep 19, 2023AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models Nov 2, 2022M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval Feb 10, 2024SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data Sep 18, 2024Measuring Sound Symbolism in Audio-visual Models Nov 8, 2024Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks