Showing 1–16 of 16 results
/ Date/ Name
Jun 7, 2023Extracting Cloud-based Model with Prior KnowledgeAug 9, 2025Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion ModelsApr 29, 2025When Memory Becomes a Vulnerability: Towards Multi-turn Jailbreak Attacks against Text-to-Image Generation SystemsApr 22, 2025A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and DeploymentDec 22, 2025Towards DM-free search for Fast Radio Bursts with Machine Learning -- I. An implementation on multibeam dataMar 24, 2026Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory PollutionMar 6, 2026Depth Charge: Jailbreak Large Language Models from Deep Safety Attention HeadsJan 30, 2026Inference-time Alignment via Sparse Junction SteeringJan 29, 2026WADBERT: Dual-channel Web Attack Detection Based on BERT ModelsMay 10, 2025T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak AttacksOct 6, 2025P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMsApr 17, 2025Mask Image WatermarkingMay 22, 2025Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked CiphersMar 6, 2026Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language ModelsDec 17, 2023SAME: Sample Reconstruction against Model Extraction AttacksApr 3, 2025Search for Fast Radio Bursts and radio pulsars from pulsing Ultraluminous X-ray Sources