I replicated David Ng's RYS method (https://dnhkng.github.io/posts/rys/) on consumer AMD GPUs
(RX 7900 XT + RX 6950 XT) and found something I didn't expect.<p>Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that
act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning
pipeline twice. No weights change. No training. The model just thinks longer.<p>The results on standard benchmarks (lm-evaluation-harness, n=50):<p>Devstral-24B, layers 12-14 duplicated once:
- BBH Logical Deduction: 0.22 → 0.76
- GSM8K (strict): 0.48 → 0.64
- MBPP (code gen): 0.72 → 0.78
- Nothing degraded<p>Qwen2.5-Coder-32B, layers 7-9 duplicated once:
- Reasoning probe: 76% → 94%<p>The weird part: different duplication patterns create different cognitive "modes" from the same
weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling
(13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing.<p>The circuit boundaries are sharp — shift by one layer and the effect disappears or inverts.
Smaller models (24B) have tighter circuits (3 layers) than larger ones (Ng found 7 layers in 72B).<p>Tools to find circuits in any GGUF model and apply arbitrary layer routing are in the repo.
The whole thing — sweep, discovery, validation — took one evening.<p>Happy to answer questions.