View a PDF of the paper titled The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure, by Yongzhong Xu
View PDF
HTML (experimental)
Abstract:Grokking — the abrupt transition from memorization to generalization long after near-zero training loss — has been studied mainly in single-task settings. We extend geometric analysis to multi-task modular arithmetic, training shared-trunk Transformers on dual-task (mod-add + mod-mul) and tri-task (mod-add + mod-mul + mod-sq) objectives across a systematic weight decay sweep. Five consistent phenomena emerge. (1) Staggered grokking order: multiplication generalizes first, followed by squaring, then addition, with consistent delays across seeds. (2) Universal integrability: optimization trajectories remain confined to an empirically invariant low-dimensional execution manifold; commutator defects orthogonal to this manifold reliably precede generalization. (3) Weight decay phase structure: grokking timescale, curvature depth, reconstruction threshold, and defect lead covary systematically with weight decay, revealing distinct dynamical regimes and a sharp no-decay failure mode. (4) Holographic incompressibility: final solutions occupy only 4–8 principal trajectory directions yet are distributed across full-rank weights and destroyed by minimal perturbations; SVD truncation, magnitude pruning, and uniform scaling all fail to preserve performance. (5) Transverse fragility and redundancy: removing less than 10% of orthogonal gradient components eliminates grokking, yet dual-task models exhibit partial recovery under extreme deletion, suggesting redundant center manifolds enabled by overparameterization. Together, these results support a dynamical picture in which multi-task grokking constructs a compact superposition subspace in parameter space, with weight decay acting as compression pressure and excess parameters supplying geometric redundancy in optimization pathways.
Submission history
From: Yongzhong Xu [view email]
[v1]
Thu, 19 Feb 2026 22:39:55 UTC (9,173 KB)
[v2]
Sat, 14 Mar 2026 05:25:12 UTC (9,942 KB)
[v3]
Fri, 3 Apr 2026 01:46:22 UTC (9,944 KB)


