Showing 1–19 of 19 results
/ Date/ Name
Apr 23, 2026CI-Work: Benchmarking Contextual Integrity in Enterprise LLM AgentsFeb 19, 2026Computer-Using World ModelJan 19, 2026A Benchmark for Language Models in Real-World System BuildingNov 2, 2025Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World SystemsAug 1, 2024AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task GenerationJul 15, 2024Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot ArenaMay 24, 2024Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly DetectionFeb 8, 2024UFO: A UI-Focused Agent for Windows OS InteractionFeb 5, 2024Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency PerspectiveDec 19, 2023Xpert: Empowering Incident Management with Query Recommendations via Large Language ModelsNov 29, 2023TaskWeaver: A Code-First Agent FrameworkNov 7, 2023Everything of Thoughts: Defying the Law of Penrose Triangle for Thought GenerationOct 28, 2023TraceDiag: Adaptive, Interpretable, and Efficient Root Cause Analysis on Large-Scale Microservice SystemsAug 18, 2023WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-InstructAug 1, 2023A Survey of Time Series Anomaly Detection Methods in the AIOps DomainJul 3, 2023ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly DetectionMay 29, 2023Assess and Summarize: Improve Outage Understanding with Large Language ModelsMay 25, 2023Automatic Root Cause Analysis via Large Language Models for Cloud IncidentsFeb 14, 2022UniParser: A Unified Log Parser for Heterogeneous Log Data