Showing 1–20 of 32 results
/ Date/ Name
Jun 16, 2024NBA: defensive distillation for backdoor removal via neural behavior alignmentMar 19, 2025Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and FindingsApr 27, 2026AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic VirtualizationOct 24, 2024SafeBench: A Safety Evaluation Framework for Multimodal Large Language ModelsOct 11, 2025SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web AgentsMar 13, 2026Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClawApr 7, 2026Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image ModelsOct 16, 2025Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual StorytellingMar 6, 2026Evolving Deception: When Agents Evolve, Deception WinsNov 17, 2025SPARK: Jailbreaking T2V Models by Synergistically Prompting Auditory and Recontextualized KnowledgeJun 10, 2024Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak AttacksJun 18, 2024DLP: towards active defense against backdoor attacks with decoupled learning processJun 17, 2025AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous InstructionsMar 10, 2026Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language ModelsMay 3, 2026TrajShield: Trajectory-Level Safety Mediation for Defending Text-to-Video Models Against Jailbreak AttacksNov 13, 2025Uncovering Strategic Egoism Behaviors in Large Language ModelsJun 14, 2025Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025Jun 6, 2024Jailbreak Vision Language Models via Bi-Modal Adversarial PromptFeb 16, 2025Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language ModelsMar 10, 2025Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application