"au:"Aleksandar Makelov"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Aleksandar Makelov"" — arXiv2 Search

Showing 1–4 of 4 results

/ Date/ Name

May 14, 2024Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control Nov 28, 2023Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching Jun 19, 2017Towards Deep Learning Models Resistant to Adversarial Attacks Jul 19, 2023Rethinking Backdoor Attacks