Showing 1–14 of 14 results
/ Date/ Name
Apr 14, 2026KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge GuidanceMay 16, 2024LFED: A Literary Fiction Evaluation Dataset for Large Language ModelsAug 19, 2024CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language ModelsJun 11, 2025Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree SearchMar 18, 2024OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and SafetyOct 30, 2023Evaluating Large Language Models: A Comprehensive SurveyFeb 22, 2024Identifying Multiple Personalities in Large Language Models with External EvaluationOct 16, 2024Self-Pluralising Culture Alignment for Large Language ModelsDec 23, 2024Large Language Model Safety: A Holistic SurveyMay 17, 2023M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language ModelsApr 6, 2026CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal ModelsFeb 4, 2026ERNIE 5.0 Technical ReportFeb 10, 2026ATTNPO: Attention-Guided Process Supervision for Efficient ReasoningMay 26, 2025TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos