Showing 1–20 of 23 results
/ Date/ Name
Dec 3, 2025DAComp: Benchmarking Data Agents across the Full Data Intelligence LifecycleMay 5, 2023Multi-View Graph Representation Learning for Answering Hybrid Numerical Reasoning QuestionMay 19, 2023S$^3$HQA: A Three-Stage Approach for Multi-hop Text-Table Hybrid Question AnsweringNov 12, 2024Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL WorkflowsOct 23, 2023TableQAKit: A Comprehensive and Practical Toolkit for Table-based Question AnsweringSep 16, 2022Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based DecoderOct 23, 2023S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language ModelsNov 15, 2023Assessing Knowledge Editing in Language Models via Relation PerspectiveJul 15, 2024Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?Oct 8, 2023MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language ModelsDec 4, 2023Competition-Level Problems are Effective LLM EvaluatorsFeb 20, 2025GATE: Graph-based Adaptive Tool Evolution Across Diverse TasksAug 12, 2025OpenCUA: Open Foundations for Computer-Use AgentsApr 28, 2026DV-World: Benchmarking Data Visualization Agents in Real-World ScenariosSep 22, 2023HRoT: Hybrid prompt strategy and Retrieval of Thought for Table-Text Hybrid Question AnsweringApr 11, 2024OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer EnvironmentsFeb 21, 2024Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing AgentJun 2, 2025Reasoning-Table: Exploring Reinforcement Learning for Table ReasoningApr 20, 2026Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent IntelligenceFeb 20, 2024MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models