Showing 261–280 of 1,726 results
/ Date/ Name
Mar 8, 2026Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging ProblemsMar 8, 2026Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding ModelsMar 8, 2026Generalization in Online Reinforcement Learning for Mobile AgentsMar 7, 2026Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese MandarinMar 6, 2026Deep Research, Shallow Evaluation: A Case Study in Meta-Evaluation for Long-Form QA BenchmarksMar 6, 2026Validation of a Small Language Model for DSM-5 Substance Category Classification in Child Welfare RecordsMar 6, 2026COLD-Steer: Steering Large Language Models via In-Context One-step Learning DynamicsMar 5, 2026Distilling Formal Logic into Neural Spaces: A Kernel Alignment Approach for Signal Temporal LogicMar 5, 2026HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated AgentsMar 4, 2026Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media EcosystemMar 3, 2026Tokenization Tradeoffs in Structured EHR Foundation ModelsMar 3, 2026Asymmetric Goal Drift in Coding Agents Under Value ConflictMar 3, 2026APRES: An Agentic Paper Revision and Evaluation SystemMar 3, 2026UniSkill: A Dataset for Matching University Curricula to Professional CompetenciesMar 3, 2026Sensory-Aware Sequential Recommendation via Review-Distilled RepresentationsMar 3, 2026Evaluating Cross-Modal Reasoning Ability and Problem Characteristics with Multimodal Item Response TheoryMar 3, 2026How Controllable Are Large Language Models? A Unified Evaluation across Behavioral GranularitiesMar 2, 2026URAG: A Benchmark for Uncertainty Quantification in Retrieval-Augmented Large Language ModelsMar 1, 2026Unified Vision-Language Modeling via Concept Space AlignmentFeb 28, 2026SkillCraft: Can LLM Agents Learn to Use Tools Skillfully?