Showing 1–20 of 20 results
/ Date/ Name
Apr 4, 2026Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto ModeMar 31, 2026SkillReducer: Optimizing LLM Agent Skills for Token EfficiencyMar 22, 2026WARBENCH: A Comprehensive Benchmark for Evaluating LLMs in Military Decision-MakingSep 6, 2025Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security AssessmentJun 20, 2025Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMsJun 11, 2025Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation ModelsMar 23, 2025STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language ModelsAug 15, 2024API-guided Dataset Synthesis to Finetune Large Code ModelsJun 8, 2024SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical MannerMay 8, 2024SPVR: syntax-to-prompt vulnerability repair based on large language modelsJan 27, 2024An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial ScenariosDec 7, 2023VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal ModelsOct 10, 2023Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric ApproachOct 10, 2023Refining Decompiled C Code with Large Language ModelsSep 29, 2023Split and Merge: Aligning Position Biases in LLM-based EvaluatorsMay 4, 2023"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect ProcessMar 6, 2023On Extracting Specialized Code Abilities from Large Language Models: A Feasibility StudyAug 17, 2022CCTEST: Testing and Repairing Code Completion SystemsApr 20, 2022Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program EmbeddingsDec 2, 2020CRaDLe: Deep Code Retrieval Based on Semantic Dependency Learning