The Duality Between Sparse Autoencoders and Concept Geometry

[Submitted on 3 Mar 2025 (v1), last revised 1 Dec 2025 (this version, v2)]

View a PDF of the paper titled Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry, by Sai Sumedh R. Hindupur and 3 other authors

View PDF
HTML (experimental)

Abstract:Sparse Autoencoders (SAEs) are widely used to interpret neural networks by identifying meaningful concepts from their representations. However, do SAEs truly uncover all concepts a model relies on, or are they inherently biased toward certain kinds of concepts? We introduce a unified framework that recasts SAEs as solutions to a bilevel optimization problem, revealing a fundamental challenge: each SAE imposes structural assumptions about how concepts are encoded in model representations, which in turn shapes what it can and cannot detect. This means different SAEs are not interchangeable — switching architectures can expose entirely new concepts or obscure existing ones. To systematically probe this effect, we evaluate SAEs across a spectrum of settings: from controlled toy models that isolate key variables, to semi-synthetic experiments on real model activations and finally to large-scale, naturalistic datasets. Across this progression, we examine two fundamental properties that real-world concepts often exhibit: heterogeneity in intrinsic dimensionality (some concepts are inherently low-dimensional, others are not) and nonlinear separability. We show that SAEs fail to recover concepts when these properties are ignored, and we design a new SAE that explicitly incorporates both, enabling the discovery of previously hidden concepts and reinforcing our theoretical insights. Our findings challenge the idea of a universal SAE and underscores the need for architecture-specific choices in model interpretability. Overall, we argue an SAE does not just reveal concepts — it determines what can be seen at all.

Submission history

From: Sai Sumedh R. Hindupur [view email]
[v1]
Mon, 3 Mar 2025 18:47:40 UTC (41,606 KB)
[v2]
Mon, 1 Dec 2025 22:17:20 UTC (42,594 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

The Duality Between Sparse Autoencoders and Concept Geometry

Two-Stage Hurdle Models: Predicting Zero-Inflated Outcomes

Manifold-Matching Autoencoders

One Model to Rule Them All? SAP-RPT-1 and the Future of Tabular Foundation Models

Bridging Facts for Cross-Document Reasoning at Index Time

SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding

How a Neural Network Learned Its Own Fraud Rules: A Neuro-Symbolic AI Experiment

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Two-Stage Hurdle Models: Predicting Zero-Inflated Outcomes

Google brings vehicle feeds to Search campaigns

70+ AI art styles to use in your AI prompts

Manifold-Matching Autoencoders

SEO Test Shows It’s Trivial To Rank Misinformation On Google

Bridging Facts for Cross-Document Reasoning at Index Time

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

The Duality Between Sparse Autoencoders and Concept Geometry

Submission history

Related Posts

Subscribe to Updates