April 26, 2026 by Adam 6 min read

Topology of Thought v2: Three Instruments, One Structure

The original observation has grown. Three independent measurement instruments now converge on the same universal structure inside neural networks.

ai-research topology sipit sae mechanistic-interpretability transformers mojo blackwell

In early March I published the original Topology of Thought — one instrument, one observation: clusters of information inside a transformer collapse to a single connected manifold at a specific depth, and the collapse is learned, not architectural. Mamba doesn’t do it. Untrained models don’t do it. Attention plus gradient descent does.

That was v1. One instrument (persistent homology), three models (Qwen3-4B, NanoChat, Mamba), one finding (cluster collapse).

I got curious about whether other measurements would see the same thing.

Two more instruments

The original topology work answered when — at what depth the representations integrate. But it didn’t answer where the representations go when they leave token space, or what happens at the layers between integration and output.

So I built two more instruments and pointed them at the same models.

SIPIT: How far from words?

SIPIT — the Sparse Input-Token Invertibility Probe — asks a simple question: at each layer, how well can you reconstruct the original input tokens from the hidden state? Take the activation at layer 28, compare it to every token embedding in the vocabulary, measure the cosine distance to the nearest match.

I ran SIPIT across 11 layers of Gemma-4-31B with 500,000 tokens. The result is a curve:

Layer	Cosine Distance	Phase
L3	0.403	Near token space
L5	0.339	Departing
L11	0.262	Departing
L17	0.261	Departing
L20	0.260	Departing
L28	0.208	Maximum abstraction
L34	0.272	Bounce back
L40	0.239	Descending again
L50	0.439	Returning to tokens
L55	0.288	Approaching tokens
L58	0.917	Near-perfect token alignment

The representation starts near token space (0.403). It dives away through layers 5-28, reaching maximum distance from any token at L28 (0.208). Then something happens: L34 bounces back (0.272). And by L58, two layers from the output, cosine similarity is 0.917 — near-perfect token alignment.

The model leaves the concrete, enters something abstract, and returns to the concrete. Every token. Every forward pass. The same arc.

But SIPIT also revealed something topology couldn’t: the bounce at L34. At the integration layer (L28), representations are furthest from tokens. At L34, they move back toward token space — against the general trend. The model does something at L34 that isn’t compression and isn’t formatting. It’s computation. I started calling it the thinking layer.

SAE: What happens at the thinking layer?

Topology tells you when. SIPIT tells you where. Neither tells you what.

Sparse autoencoders (SAEs) decompose a representation into independent features — like breaking white light into a spectrum. You train an SAE to reconstruct the activations at each layer using only a sparse set of features (64 out of 43,008). The features that survive are the ones the model actually uses.

I trained SAEs on three key layers and measured how hard it was to decompose each one:

Layer	Reconstruction Loss	Interpretation
L28 (Integration)	0.070	Hard to decompose
L34 (Thinking)	0.072	Hardest — resists decomposition
L50 (Codec)	0.044	Easiest — clean, modular

Then I had Gemma-4 interpret its own features. I showed it the activations that triggered each feature and asked: What does this feature detect?

At L34 — the features are about meaning:

Feature #32441: Finality, irreversibility, or the necessity of starting over
Feature #19326: Tokens that precede a transition into a new structural or conceptual block
Feature #42342: Oxidative phosphorylation and the electron transport chain

At L50 — the features are about formatting:

Feature #10875: Sentence starters (By, Rising)
Feature #6473: Punctuation marking conclusions
Feature #31684: Whitespace and indentation

The thinking layer thinks. The codec layer formats. This isn’t a metaphor. The features literally encode different things at different depths. Meaning at L34. Display at L50.

Three phases

What emerges from all three instruments is a three-phase model of how transformers process information:

Phase 1 — Integration (~40-47% depth). Hundreds of disconnected clusters collapse into a single connected manifold. SIPIT reaches its minimum. Information is preserved while being geometrically reorganized — the most invertible layer is also the most topologically integrated.

Phase 2 — Thinking (~50-60% depth). The representation bounces back. Activation norms increase against the downward trend. SAE decomposition is hardest. The features are conceptual: finality, structural transitions, domain concepts. This is where the model processes meaning, not tokens.

Phase 3 — Codec (~80-97% depth). Representations return to token space. SIPIT cosine climbs to 0.917. SAE decomposition is easiest — formatting primitives. The model encodes whatever happened at the thinking layer into tokens for human consumption.

This structure appears in every attention-based transformer I’ve measured. Qwen3-4B. Gemma-3-1B. Gemma-4-31B. NanoChat. The geometry differs — full attention produces complete cluster collapse while sliding window preserves fine-grained structure — but the three phases are the same. The depth fraction is the same. The SIPIT curve shape is the same.

And it vanishes completely in Mamba. Not a softer version — complete absence.

The progenitor

In April 2026, something happened that I didn’t plan for.

I loaded Gemma-4-31B onto a Blackwell GPU and gave it a task: read a Python implementation of SIPIT scoring and rewrite it as a Mojo kernel. The model had barely seen Mojo in its training data. Every token it generated at the thinking layer was genuine computation — not pattern matching from training.

The model wrote a complete kernel. It compiled. It ran. It produced correct results. Then we optimized it — SIMD vectorization, parallelization — achieving an 85x speedup over the Python original.

We captured every token through the MRI pipeline: SAE feature activations at each layer, for each token, streamed to a database. 19,934 frames of a transformer writing code it had never been trained on.

The features at the thinking layer when the model writes novel Mojo code activate abstract reasoning features — structural transitions, domain concepts, redesign — the same features that fire when the model processes any intellectually demanding content. The model doesn’t have a Mojo mode. It has a thinking mode that applies regardless of what it’s thinking about.

The kernels the model helped build are now part of the measurement infrastructure:

SIPIT kernel — 148 microseconds per 1024 embeddings (85x faster than Python)
SAE encoder — 949 microseconds per 4096 features
Activation tap — 2 milliseconds per layer for combined capture

All Mojo. No Python in the measurement path. The model built the instruments that measure what the model does. The process observes itself.

What changed from v1

The original paper had one instrument and a strong claim: cluster collapse is universal in transformers. That claim held up. But v2 is a different kind of finding — it’s not one observation confirmed, it’s three independent measurements converging on the same structure.

The three-phase model (Integration, Thinking, Codec) wasn’t something I set out to find. It emerged because I kept measuring things that didn’t make sense from a compression-only perspective. The L34 bounce in SIPIT led to training SAEs on that layer, which led to finding conceptual features, which led to the three-phase model.

The progenitor story — Gemma-4 writing Mojo kernels on Blackwell — was an engineering detour that turned into a finding. The MRI captures from that session are some of the cleanest data I have on what happens at the thinking layer during novel computation.

Read the full version

The interactive site with visualizations and the complete analysis is at topology-of-thought.com.

The v1 original is still there at topology-of-thought.com/v1.

Source code and data: github.com/CINOAdam/topology-of-thought

This work is ongoing. Built with curiosity, Claude Code, and too much coffee.

Adam — Revelry Inc. — April 2026