Showing 401–420 of 1,726 results
/ Date/ Name
Oct 9, 2025Everyone prefers human writers, including AIOct 9, 2025dInfer: An Efficient Inference Framework for Diffusion Language ModelsOct 9, 2025Test-Time Matching: Unlocking Compositional Reasoning in Multimodal ModelsOct 8, 2025All Claims Are Equal, but Some Claims Are More Equal Than Others: Importance-Sensitive Factuality Evaluation of LLM GenerationsOct 7, 2025Prompt reinforcing for long-term planning of large language modelsOct 5, 2025LH-Deception: Simulating and Understanding LLM Deceptive Behaviors in Long-Horizon InteractionsOct 1, 2025Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine TranslationOct 1, 2025MOSS-Speech: Towards True Speech-to-Speech Models Without Text GuidanceSep 30, 2025Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget AllocationSep 29, 2025TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language ModelsSep 29, 2025Towards Safe Reasoning in Large Reasoning Models via Corrective InterventionSep 29, 2025How Training Data Shapes the Use of Parametric and In-Context Knowledge in Language ModelsSep 29, 2025Your thoughts tell who you are: Characterize the reasoning patterns of LRMsSep 28, 2025Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM AlignmentSep 27, 2025Cognition-of-Thought Elicits Social-Aligned Reasoning in Large Language ModelsSep 27, 2025General Exploratory Bonus for Optimistic Exploration in RLHFSep 26, 2025StateX: Enhancing RNN Recall via Post-training State ExpansionSep 26, 2025MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document ParsingSep 26, 2025From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round RefinementSep 26, 2025SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios