arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Aleksandar Makelov"" — arXiv2 Search
Showing 1–4 of 4 results
/ Date
/ Name
May 14, 2024
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Nov 28, 2023
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
Jun 19, 2017
Towards Deep Learning Models Resistant to Adversarial Attacks
Jul 19, 2023
Rethinking Backdoor Attacks