Showing 21–40 of 45 results
/ Date/ Name
Feb 27, 2024Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language ModelsJun 17, 2024When Reasoning Meets Information Aggregation: A Case Study with Sports NarrativesMar 13, 2024Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM EraOct 23, 2025Ask a Strong LLM Judge when Your Reward Model is UncertainOct 28, 2022Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language ModelsJan 7, 2024InFoBench: Evaluating Instruction Following Ability in Large Language ModelsMar 19, 2022Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue ComprehensionMay 29, 2024MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsOct 2, 2024DeFine: Decision-Making with Analogical Reasoning over Factor ProfilesOct 4, 2024DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories SearchJan 6, 2026A Versatile Multimodal Agent for Multimedia Content GenerationMay 22, 2025WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement LearningMar 30, 2025Towards Trustworthy GUI Agents: A SurveyApr 14, 2026WebXSkill: Skill Learning for Autonomous Web AgentsNov 9, 2022Efficient Zero-shot Event Extraction with Context-Definition AlignmentDec 6, 2022ZeroKBC: A Comprehensive Benchmark for Zero-Shot Knowledge Base CompletionJul 8, 2023A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence GenerationNov 9, 2023TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMsOct 12, 2017Using Context Events in Neural Network Models for Event Temporal Status IdentificationOct 17, 2025Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential