cs.SE — arXiv2

Nov 24, 2025SLMFix: Leveraging Small Language Models for Error Fixing with Reinforcement Learning

Nov 2, 2025Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World Systems

Oct 24, 2025AgentBound: Securing Execution Boundaries of AI Agents

Oct 21, 2025CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent

Oct 20, 2025What Makes AI Research Replicable? Executable Knowledge Graphs as Scientific Knowledge Representations

Oct 15, 2025OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies

Oct 6, 2025FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration

Oct 6, 2025DynamiQ: Unlocking the Potential of Dynamic Task Allocation in Parallel Fuzzing

Sep 30, 2025CWM: An Open-Weights LLM for Research on Code Generation with World Models

Sep 26, 2025SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios

Sep 15, 2025A Practical Adversarial Attack against Sequence-based Deep Learning Malware Classifiers

Sep 12, 2025Enhancing LLM-based Specification Generation via Program Slicing and Logical Deletion

Aug 20, 2025Trace-Based Reconstruction of Quantum Circuit Dataflow in Surface Codes

Aug 4, 2025Flow Sensitivity without Control Flow Graph: An Efficient Andersen-Style Flow-Sensitive Pointer Analysis

Jul 30, 2025From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications

Jul 16, 2025GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Jul 12, 2025Enhancing Interpretability in Software Change Management with Chain-of-Thought Reasoning

Jun 24, 2025Towards an Oracle for Binary Decomposition Under Compilation Variance

Jun 11, 2025Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models

Jun 4, 2025LogSage: An LLM-Based Framework for CI/CD Failure Detection and Remediation with Industrial Validation