Showing 341–360 of 1,726 results
/ Date/ Name
Dec 2, 2025DeepSeek-V3.2: Pushing the Frontier of Open Large Language ModelsDec 2, 2025Process-Centric Analysis of Agentic Software SystemsDec 1, 2025Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution TasksDec 1, 2025Learning the Boundary of Solvability: Aligning LLMs to Detect Unsolvable ProblemsNov 27, 2025DeepSeekMath-V2: Towards Self-Verifiable Mathematical ReasoningNov 25, 2025Soft Adaptive Policy OptimizationNov 24, 2025DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep ResearchNov 24, 2025How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM PretrainingNov 21, 2025Selective Rotary Position EmbeddingNov 21, 2025The PLLuM Instruction CorpusNov 21, 2025Closing the Performance Gap Between AI and Radiologists in Chest X-Ray ReportingNov 20, 2025Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual GenerationNov 19, 2025MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert SkippingNov 18, 2025ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific ReasoningNov 17, 2025Dropouts in Confidence: Moral Uncertainty in Human-LLM AlignmentNov 17, 2025Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM PerformanceNov 16, 2025On the Brittleness of LLMs: A Journey around Set MembershipNov 14, 2025DiscoX: Benchmarking Discourse-Level Translation task in Expert DomainsNov 13, 2025Instella: Fully Open Language Models with Stellar PerformanceNov 13, 2025AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following