Showing 201–220 of 1,726 results
/ Date/ Name
Apr 22, 2026AgentSOC: A Multi-Layer Agentic AI Framework for Security Operations AutomationApr 21, 2026SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language ModelsApr 20, 2026Proposing Topic Models and Evaluation Frameworks for Analyzing Associations with External Outcomes: An Application to Leadership Analysis Using Large-Scale Corporate Review DataApr 20, 2026Towards Understanding the Robustness of Sparse AutoencodersApr 20, 2026Beyond Indistinguishability: Measuring Extraction Risk in LLM APIsApr 20, 2026FUSE: Ensembling Verifiers with Zero Labeled DataApr 20, 2026Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM JailbreaksApr 20, 2026Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM ReasoningApr 20, 2026Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection DetectionApr 20, 2026Owner-Harm: A Missing Threat Model for AI Agent SafetyApr 20, 2026Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLMApr 20, 2026Enabling AI ASICs for Zero Knowledge ProofApr 19, 2026Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?Apr 18, 2026Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber TasksApr 18, 2026Bolzano: Case Studies in LLM-Assisted Mathematical ResearchApr 18, 2026The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized ConsensusApr 17, 2026AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel GenerationApr 17, 2026A Case Study on the Impact of Anonymization Along the RAG PipelineApr 16, 2026XQ-MEval: A Dataset with Cross-lingual Parallel Quality for Benchmarking Translation MetricsApr 16, 2026CURA: Clinical Uncertainty Risk Alignment for Language Model-Based Risk Prediction