Showing 1–12 of 12 results
/ Date/ Name
Apr 28, 2025Prompt Injection Attack to Tool Selection in LLM AgentsFeb 21, 2023BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPTMar 26, 2024Optimization-based Prompt Injection Attack to LLM-as-a-JudgeJul 1, 2024Self-Cognition in Large Language Models: An Exploratory StudyMar 20, 2025BadToken: Token-level Backdoor Attacks to Multi-modal Large Language ModelsOct 4, 2023MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to UseJun 6, 2024AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency LensApr 10, 2026BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill PoisoningFeb 20, 2025On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and PerspectiveMar 8, 2025Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented GenerationMay 29, 2025Merge Hijacking: Backdoor Attacks to Model Merging of Large Language ModelsMay 23, 2025SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator