Showing 21–40 of 45 results
/ Date/ Name
Jan 23, 2020Graph Constrained Reinforcement Learning for Natural Language Action SpacesFeb 19, 2020How To Avoid Being Eaten By a Grue: Exploration Strategies for Text-Adventure AgentsOct 7, 2021Situated Dialogue Learning through Procedural Environment GenerationSep 8, 2019Story Realization: Expanding Plot Events into SentencesOct 3, 2022Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy OptimizationOct 13, 2022Behavior Cloned Transformers are Neurosymbolic ReasonersApr 6, 2026How Reasoning Evolves from Post-Training Data: An Empirical Study Using ChessApr 24, 2025Collaborating Action by Action: A Multi-agent LLM Framework for Embodied ReasoningApr 19, 2025TALES: Text Adventure Learning Environment SuiteNov 7, 2025Long Grounded Thoughts: Synthesizing Visual Problems and Reasoning Chains at ScaleAug 21, 2024Critique-out-Loud Reward ModelsDec 20, 2022I Cast Detect Thoughts: Learning to Converse and Guide with Intents and Theory-of-Mind in Dungeons and DragonsNov 17, 2025Preference-Based Learning in Audio Applications: A Systematic AnalysisJul 2, 2022INSCIT: Information-Seeking Conversations with Mixed-Initiative InteractionsJun 2, 2023Fine-Grained Human Feedback Gives Better Rewards for Language Model TrainingAug 16, 2024CPS-TaskForge: Generating Collaborative Problem Solving Environments for Diverse Communication TasksApr 9, 2025A Survey on Personalized and Pluralistic Preference Alignment in Large Language ModelsOct 1, 2025Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable RewardsOct 17, 2023Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter MergingMay 24, 2023Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning