Showing 1–11 of 11 results
/ Date/ Name
Mar 31, 2026SkillReducer: Optimizing LLM Agent Skills for Token EfficiencyMar 22, 2026WARBENCH: A Comprehensive Benchmark for Evaluating LLMs in Military Decision-MakingMar 23, 2025STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language ModelsJun 8, 2024SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical MannerJan 27, 2024An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial ScenariosDec 7, 2023VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal ModelsOct 10, 2023Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric ApproachSep 29, 2023Split and Merge: Aligning Position Biases in LLM-based EvaluatorsMay 4, 2023"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect ProcessMar 6, 2023On Extracting Specialized Code Abilities from Large Language Models: A Feasibility StudyApr 20, 2022Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings