Showing 1–20 of 61 results
/ Date/ Name
Sep 27, 2024ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5Jun 20, 2024Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic ManipulationFeb 22, 2023MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech RecognitionSep 19, 2024Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic DataJul 24, 2025DIFFA: Large Language Diffusion Models Can Listen and UnderstandJan 30, 2026DIFFA-2: A Practical Diffusion Large Language Model for General Audio UnderstandingNov 28, 2023Towards Weakly Supervised End-to-end Learning for Long-video Action RecognitionJun 6, 2024Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual DatastoresJul 20, 2025Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task SchedulingDec 21, 2023kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo LabelsFeb 26, 2025CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech RecognitionJan 22, 2024ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action RecognitionSep 18, 2024M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing WhisperMay 21, 2025Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationJul 12, 2024Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement FrameworkSep 9, 2024Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition ChallengeJul 15, 2024Human-Centric Transformer for Domain Adaptive Action RecognitionNov 19, 2024GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented GraspingJan 8, 2026CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech ModelsSep 18, 2025Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning