Showing 1–14 of 14 results
/ Date/ Name
Mar 31, 2024Against The Achilles' Heel: A Survey on Red Teaming for Generative ModelsJul 20, 2025AgentFly: Extensible and Scalable Reinforcement Learning for LM AgentsFeb 18, 2024Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as AgentsNov 11, 2024Explore the Reasoning Capability of LLMs in the Chess TestbedDec 17, 2023Demystifying Instruction Mixing for Fine-tuning Large Language ModelsOct 4, 2024ToolGen: Unified Tool Retrieval and Calling via GenerationFeb 19, 2025SCALAR: Scientific Citation-based Live Assessment of Long-context Academic ReasoningDec 24, 2024Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and CapabilityJan 26, 2026Neural Theorem Proving for Verification Conditions: A Real-World BenchmarkOct 11, 2025Concise Reasoning in the Lens of Lagrangian OptimizationDec 5, 2025K2-V2: A 360-Open, Reasoning-Enhanced LLMJan 13, 2025LLM360 K2: Building a 65B 360-Open-Source Large Language Model from ScratchJul 25, 2025A Minimalist Proof Language for Neural Theorem Proving over Isabelle/HOLFeb 11, 2026SimuScene: Training and Benchmarking Code Generation to Simulate Physical Scenarios