Showing 1–20 of 31 results
/ Date/ Name
Feb 14, 2022Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training BenchmarkNov 26, 2022Lexicon-injected Semantic Parsing for Task-Oriented DialogSep 13, 2021UniMS: A Unified Framework for Multimodal Summarization with Knowledge DistillationMar 15, 2024HawkEye: Training Video-Text LLMs for Grounding Text in VideosJan 23, 2025ReasVQA: Advancing VideoQA with Imperfect Reasoning ProcessDec 23, 2024Friends-MMC: A Dataset for Multi-modal Multi-party Conversation UnderstandingApr 10, 2025Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUsJul 12, 2025ProactiveVideoQA: A Comprehensive Benchmark Evaluating Proactive Interactions in Video Large Language ModelsOct 23, 2025Why Did Apple Fall: Evaluating Curiosity in Large Language ModelsSep 19, 2020Learning to Attack: Towards Textual Adversarial Attacking in Real-world SituationsMar 8, 2022HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language TasksJul 21, 2024End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive SamplingNov 27, 2024VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction FormatDec 12, 2023Unsupervised Extractive Summarization with Learnable Length Control StrategiesMar 14, 2022Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal InformationJun 12, 2024Prompt-Based Length Controlled Generation with Multiple Control TypesMay 27, 2025Pangu Pro MoE: Mixture of Grouped Experts for Efficient SparsityMay 7, 2025Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUsMay 8, 2023Learning Summary-Worthy Visual Representation for Abstractive Summarization in VideoNov 4, 2024Sparsing Law: Towards Large Language Models with Greater Activation Sparsity