Showing 1–16 of 16 results
/ Date/ Name
Jan 26, 2022Natural Language Descriptions of Deep Visual FeaturesAug 3, 2023Multimodal Neurons in Pretrained Text-Only TransformersOct 8, 2021Toward a Visual Concept Vocabulary for GAN Latent SpaceApr 22, 2024Automatic Discovery of Visual CircuitsDec 20, 2020Latent Compass: Creation by NavigationSep 7, 2023FIND: A Function Description Benchmark for Evaluating Interpretability MethodsOct 31, 2024Nearest Neighbor Normalization Improves Multimodal RetrievalApr 22, 2024A Multimodal Automated Interpretability AgentFeb 3, 2025Eliciting Language Model Behaviors with Investigator AgentsDec 17, 2025Predictive Concept Decoders: Training Scalable End-to-End Interpretability AssistantsNov 19, 2023An Alternative to Regulation: The Case for Public AIJun 25, 2025The Singapore Consensus on Global AI Safety Research PrioritiesJun 5, 2025Line of Sight: On Linear Representations in VLLMsJan 30, 2026Language Model Circuits Are Sparse in the Neuron BasisApr 8, 2026ADAG: Automatically Describing Attribution GraphsJul 3, 2025Establishing Best Practices for Building Rigorous Agentic Benchmarks