Showing 1–16 of 16 results
/ Date/ Name
Mar 2, 2022Code Smells in Machine Learning SystemsFeb 26, 2024Beyond Self-learned Attention: Mitigating Attention Bias in Transformer-based Models Using Attention GuidanceFeb 18, 2025UXAgent: An LLM Agent-Based Usability Testing Framework for Web DesignJul 28, 2025Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human EvaluationSep 25, 2025LLM Agent Meets Agentic AI: Can LLM Agents Simulate Customers to Evaluate Agentic-AI-based Shopping Assistants?Jun 5, 2025OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior SimulationMar 26, 2025Can LLM Agents Simulate Multi-Turn Human Behavior? Evidence from Real Online Customer Behavior DataJan 28, 2026Trajectory2Task: Training Robust Tool-Calling Agents with Synthesized Yet Verifiable Data for Complex User IntentsOct 17, 2025WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at ScaleJul 23, 2025Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement LearningOct 22, 2025See, Think, Act: Online Shopper Behavior Simulation with VLM AgentsAug 5, 2025Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-CorrectionSep 25, 2025SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMsMar 7, 2024Towards Robustness Analysis of E-Commerce Ranking SystemOct 3, 2024Does the Order of Fine-tuning Matter and Why?Apr 13, 2025UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents