LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?

An exploration of the RYS (Repeat Your Self) method for optimizing LLMs without retraining, revealing a distinct three-phase internal structure in modern models like Qwen3.5.

In Part 1, I described how duplicating a block of seven middle layers in Qwen2-72B — no weight changes, no training — produced the #1 model on the HuggingFace Open LLM Leaderboard. The method, which I called RYS (Repeat Your Self), was discovered using nothing but hard math probes and EQ-Bench on a pair of RTX 4090s.

That was mid-2024. Since then, a flood of strong open-source models has arrived — Qwen3.5, MiniMax, GLM-4.7, and others — and I finally have enough compute at home to scan them properly.

So the question driving this post is simple: was RYS a fluke of Qwen2-72B, or is it a general property of Transformers?

More specifically:

*Does relayering still help on stronger modern models?**Which modifications actually earn their extra layers?*If two good motifs help independently, do they stack?

The short answer is yes, relayering survives. The longer answer took 3,024 beam search candidates, a surrogate model scoring 2 million configurations, and a unified validation sweep to work out properly. Along the way, I also released the scanning code and a set of new RYS models.

Let’s get into it!

The Qwen3.5 family dropped around Chinese New Year 2026 and immediately became the darling of the LocalLLaMA crowd. Strong benchmarks, good vibes, well-engineered.

I’m most interested in models over 200B — that’s what my dual Grace-Hopper system is built for — but the broader community runs smaller models, and the 27B size hits a sweet spot: large enough to have interesting internal structure, small enough that most people with a decent GPU can actually use a RYS variant.

There’s also a scientific reason. In Part 1, I noted that smaller models tend to have more entangled functional anatomy — encoding, reasoning, and decoding are less cleanly separated. If RYS still works on a 27B model, that tells us the circuit structure is robust even when the brain is more compact. If it doesn’t work, that’s also interesting.

(MiniMax M2.5 and others are in the pipeline. The Hopper is grinding. More heatmaps to come.)

Before we get to the optimisation work, I want to show something new. In Part 1, the three-phase hypothesis — early layers encode, middle layers reason, late layers decode — was inferred indirectly from the Base64 observation and the heatmap patterns. It was a good story, but I couldn’t see the anatomy directly. I could only see its consequences.

That changed thanks to Evan Maunder, who ran a beautifully simple experiment after reading Part 1. He fed three semantically identical sentences through a model — one in English, one in Mandarin, one encoded as Base64 — and measured the cosine similarity of their hidden states at every layer. The results showed exactly the three-phase structure: rapid convergence in the first few layers (encoding), near-perfect similarity through the middle (reasoning in a format-agnostic space), and divergence in the final layers (decoding back to surface form).

I wanted to push this further. Evan’s experiment compared languages while holding content fixed. But what happens when you vary both? If the middle layers really operate in a universal “thinking space,” then two sentences about the same topic should be more similar than two sentences in the same language about different topics — even if one is in English and the other is in Chinese.

So I set up a 6-way comparison on Qwen3.5-27B. Four inputs: ** English fact, English poem, Chinese fact, Chinese poem** — all on the same subject. Then I computed pairwise cosine similarity of the pooled hidden states at every layer, producing six curves:

EN fact:“The process of photosynthesis converts light energy into chemical energy, which is stored in glucose molecules.”Chinese fact (translated):“光合作用将光能转化为化学能，储存在葡萄糖分子中。”EN poem:“At dusk, the moon pours silver on the tide, and the wind carries a quiet song.”ZH poem (translated):“黄昏时，月亮把银辉洒在潮汐上，风里带着一首安静的歌。”

And the comparisons:

**Same language, different content:EN fact ↔ EN poem, ZH fact ↔ ZH poemCross-language, same content:EN fact ↔ ZH fact, EN poem ↔ ZH poemCross-language, different content:**EN fact ↔ ZH poem, EN poem ↔ ZH fact

Raw cosine similarity across layers. All pairs converge rapidly in the first few layers. The interesting divergence happens in the mid-to-late stack.

The raw plot already tells a story. All six pairs start at moderate similarity (0.4–0.6 at the embedding layer), then rapidly converge to near-1.0 by about layer 5. Through the mid-stack, they stay high but begin to separate. The blue line (EN poem ↔ ZH poem: different language, same content) stays above the red line (EN fact ↔ EN poem: same language, different content) from roughly layer 15 onward. The model’s internal representation cares more about what you’re saying than what language you’re saying it in.

The aggregate numbers:

Cross-language, same content: 0.920mean similarity - Same-language, different content: 0.882 - Cross-language, different content: 0.835

But the raw cosine similarities are dominated by a large shared component — every hidden state at a given layer lives in roughly the same region of the space (the “hyper-cone” effect that’s well-documented in the literature). To see the structure more clearly, I applied per-layer centering: subtract the mean vector across all four inputs at each layer, then re-normalise before computing cosine similarity. This strips out the “I’m at layer N” component and reveals only how the representations differ from each other.

Centered cosine similarity. After removing the shared component, the organisation becomes stark.

Now the anatomy is unmistakable. Three phases:

Encoding (layers 0–5). Wild oscillation. The model is doing heavy lifting to normalise radically different surface forms. Language identity dominates — same-language pairs (red, brown) are more similar than cross-language pairs. The model knows it’s reading English vs Chinese, and the representations reflect that.

Reasoning (layers ~10–50). The blue line (EN poem ↔ ZH poem) and the orange line (EN fact ↔ ZH fact) dominate. These are the cross-language same-content pairs. The model has converged on a representation where content identity matters far more than language identity. Same-language different-content pairs (red, brown) are anti-correlated with the cross-language same-content pairs, so the model is actively separating “what you said” from “what language you said it in.” This is the format-agnostic reasoning space that Part 1 hypothesised.

Decoding (layers ~55–64). Everything collapses and re-differentiates. The model is preparing to emit tokens, and it needs to commit to a specific language and format. Cross-language similarity drops sharply. The representations become surface-specific again.

This is the three-phase architecture, directly visible in a single forward pass. The encoding region is narrow, just a handful of layers. The reasoning region is quite broad, the bulk of the transformer stack. And the decoding region is where everything unravels back into language-specific token predictions.

More importantly for this post: the reasoning region maps almost perfectly onto where the RYS heatmaps show improvement. The layers that can be profitably duplicated are the layers where the model is thinking in its universal internal language. The layers that can’t be duplicated (the blue walls in the heatmaps) are the encoding and decoding boundaries. This isn’t a coincidence. If a layer is operating in a format-agnostic space, its input and output distributions are similar enough that you can loop back without catastrophic distribution mismatch. If a layer is doing format-specific work, looping back means feeding decoded representations into a layer that expects abstract ones, or vice versa.

The anatomy predicts the heatmaps. Let’s look at them.

I’ll show you the data before the methodology, because the heatmaps are

Source: Hacker News