Showing 481–500 of 1,726 results
/ Date/ Name
Jun 19, 2025Under the Shadow of Babel: How Language Shapes Reasoning in LLMsJun 18, 2025RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning EvaluationJun 18, 2025DeVisE: Behavioral Testing of Medical Large Language ModelsJun 16, 2025MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning AttentionJun 13, 2025Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMsJun 13, 2025Post Persona Alignment for Multi-Session Dialogue GenerationJun 12, 2025MagistralJun 12, 2025CIIR@LiveRAG 2025: Optimizing Multi-Agent Retrieval Augmented Generation through Self-TrainingJun 12, 2025SDialog: A Python Toolkit for End-to-End Agent Building, User Simulation, Dialog Generation, and EvaluationJun 12, 2025TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningJun 12, 2025Provably Learning from Language FeedbackJun 11, 2025Continuously Updating Digital Twins using Large Language ModelsJun 9, 2025ProtocolLLM: RTL Benchmark for SystemVerilog Generation of Communication ProtocolsJun 9, 2025Evaluating LLMs Robustness in Less Resourced Languages with Proxy ModelsJun 9, 2025Instructing Large Language Models for Low-Resource Languages: A Systematic Study for BasqueJun 6, 2025dots.llm1 Technical ReportJun 5, 2025Toward Automated Robustness Evaluation of Mathematical ReasoningJun 4, 2025Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM GenerationJun 4, 2025Is linguistically-motivated data augmentation worth it?Jun 3, 2025Quantitative LLM Judges