cs.SE — arXiv2

May 29, 2025GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

May 28, 2025Jailbreak Distillation: Renewable Safety Benchmarking

Apr 27, 2025Critical Considerations on Effort-aware Software Defect Prediction Metrics

Apr 24, 2025Detection, Classification and Prevalence of Self-Admitted Aging Debt

Apr 12, 2025SmartShift: A Secure and Efficient Approach to Smart Contract Migration

Apr 9, 2025R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Mar 28, 2025Challenges and Paths Towards AI for Software Engineering

Mar 24, 2025What is Business Process Automation Anyway?

Feb 19, 2025Where's the Bug? Attention Probing for Scalable Fault Localization

Feb 12, 2025Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis

Jan 12, 2025AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

Dec 18, 2024Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis

Nov 5, 2024GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models

Sep 24, 2024Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation

Aug 15, 2024API-guided Dataset Synthesis to Finetune Large Code Models

Jul 26, 2024Optimizing Checkpoint-Restart Mechanisms for HPC with DMTCP in Containers at NERSC

Jul 16, 2024Building AI Agents for Autonomous Clouds: Challenges and Design Principles

Jul 2, 2024Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Logs

Jun 27, 2024Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis

Jun 20, 2024CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors