Showing 1–20 of 26 results
/ Date/ Name
Mar 27, 2026Xpertbench: Expert Level Tasks with Rubrics-Based EvaluationJan 9, 2026The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought ReasoningNov 18, 2025First measurement of reactor neutrino oscillations at JUNONov 18, 2025Initial performance results of the JUNO detectorNov 14, 2025DiscoX: Benchmarking Discourse-Level Translation task in Expert DomainsSep 30, 2025Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget AllocationSep 4, 2025Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?May 29, 2025ScaleLong: A Multi-Timescale Benchmark for Long Video UnderstandingMay 20, 2025KORGym: A Dynamic Game Platform for LLM Reasoning EvaluationMay 11, 2025Seed1.5-VL Technical ReportApr 10, 2025Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement LearningFeb 20, 2025SuperGPQA: Scaling LLM Evaluation across 285 Graduate DisciplinesJun 21, 2024GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language ModelsMay 28, 2024Potential to identify neutrino mass ordering with reactor antineutrinos at JUNOMay 28, 2024Prediction of Energy Resolution in the JUNO ExperimentMay 28, 2024JUNO Sensitivity to Invisible Decay Modes of NeutronsOct 1, 2023TIGERScore: Towards Building Explainable Metric for All Text Generation TasksSep 13, 2023Real-time Monitoring for the Next Core-Collapse Supernova in JUNOMar 9, 2023The JUNO experiment Top TrackerDec 16, 2022JUNO Sensitivity on Proton Decay $p\to \barνK^+$ Searches