Showing 1–10 of 10 results
/ Date/ Name
Apr 23, 2023Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating Holistic Domain Knowledge of Large Language Model--A Preliminary ReleaseJun 18, 2025AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedMar 20, 2024AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent BehaviorJun 15, 2024VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate ItJun 9, 2023Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge EvaluationJun 18, 2024DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?Jul 11, 2023Piecing Together Clues: A Benchmark for Evaluating the Detective Skills of Large Language ModelsJul 21, 2018Accurate Energy-Efficient Power Control for Uplink NOMA Systems under Delay ConstraintMar 12, 2024Efficiently Quantifying and Mitigating Ripple Effects in Model EditingApr 1, 2025ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection