Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transition

Researchers have derived a sharp asymptotic expansion and optimal quantization densities for matrix multiplication, identifying a correlation-driven phase transition that improves efficiency in LLMs and optimization tasks.
Computer Science > Information Theory
Title:Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transition
View PDF HTML (experimental)Abstract:We study entrywise scalar quantization of two matrices prior to multiplication. Given $A\in R^{m\times k}$ and $B\in R^{k\times n}$, we quantize entries of $A$ and $B$ independently using scalar quantizers with $K_X$ and $K_Y$ levels per entry, and form $\widehat C=\widehat A,\widehat B$. The objective is to minimize the matrix multiplication mean-squared error (MSE) $E[|{AB-\widehat A\widehat B}|F^2]$ under a pair-i.i.d.\ inner-product model. In the high-resolution regime $K_X,K_Y\to\infty$, we derive a sharp $K^{-2}$ asymptotic expansion for $\mathcal{E}$, identify the exact optimal leading constants, and characterize asymptotically optimal quantization center densities in terms of conditional second moments. We then specialize to correlated Gaussian multiplicative pairs, obtaining a closed-form optimal point density [ \lambda^\star(u)\ \propto\ \exp!\left(-\frac{u^2}{6}\right)\bigl((1-\rho^2)+\rho^2u^2\bigr)^{1/3}, \qquad u=\frac{x}{\sigma_X}, ] with the same form for $y/\sigma_Y$, and prove a correlation-driven phase transition: the density is unimodal at the origin for $|\rho|\leq 1/\sqrt{3}$ and becomes bimodal for $|\rho|>1/\sqrt{3}$ with peaks at $u{\mathrm{peak}}=\pm\sqrt{3-1/\rho^2}$. We show our method's applicability in synthetic experiments such as matrix multiplication quantization and least squares optimization, as well as quantization of large language model key and query activations.
Source: arXiv cs.AI Recent










