Showing 1–15 of 15 results
/ Date/ Name
Jul 11, 2024Foundation Model Engineering: Engineering Foundation Models Just as Engineering SoftwareJan 13, 2025Data and System Perspectives of Sustainable Artificial IntelligenceOct 9, 2025AppForge: From Assistant to Independent Developer -- Are GPTs Ready for Software Development?Feb 1, 2023CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained ModelsJan 6, 2025An Infrastructure Software Perspective Toward Computation Offloading between Executable Specifications and Foundation ModelsFeb 11, 2026UI-Oceanus: Scaling GUI Agents with Synthetic Environmental DynamicsMar 10, 2026An Empirical Study and Theoretical Explanation on Task-Level Model-Merging CollapseFeb 15, 2026GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-TrainingSep 20, 2024Skill-Adpative Imitation Learning for UI Test ReuseJan 6, 2025Beyond Pass or Fail: Multi-Dimensional Benchmarking of Foundation Models for Goal-based Mobile UI NavigationAug 26, 2024SWE-bench-java: A GitHub Issue Resolving Benchmark for JavaNov 24, 2025KernelBand: Steering LLM-based Kernel Optimization via Hardware-Aware Multi-Armed BanditsJan 6, 2025DeCon: Detecting Incorrect Assertions via Postconditions Generated by a Large Language ModelDec 15, 2025From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM AgentsFeb 23, 2025Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation