DEV-TOOLSMarch 27, 20261 min read8 views

TinyLoRA – Learning to Reason in 13 Parameters

Researchers introduce TinyLoRA, a method enabling large language models to achieve high reasoning accuracy with as few as 13 parameters. The study highlights the critical role of reinforcement learning in efficiently unlocking AI's cognitive abilities.

Computer Science > Machine Learning

Title: Learning to Reason in 13 Parameters

Abstract: Recent research has shown that language models can learn to reason, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90% of performance improvements while training 1000x fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using SFT require 100-1000x larger updates to reach the same performance.

Source: Hacker News

More in this category

dev-tools

Linux kernel will support $ORIGIN, sort of

Farid Zakaria shares his journey of proposing a patch to the Linux kernel to support relocatable binaries in Nix, which evolved into a powerful eBPF-based solution for binfmt_misc.

dev-tools

Five US tech giants' hidden debts soar to $1.65T on opaque AI funding

A Nikkei study reveals that off-balance-sheet debts at five major US tech companies have surged eightfold to $1.65 trillion, driven by massive AI investments like data center leases and GPU contracts.

dev-tools

Running Doom on Our Custom CPU and Going Viral

Two developers successfully built a custom CPU from scratch at the logic gate level, deployed it on an FPGA, and optimized its memory architecture to run the classic game DOOM.

dev-tools

Incremental – A library for incremental computations

Incremental is a library designed for building complex computations that update efficiently when inputs change. Inspired by self-adjusting computation research, it is highly useful for large-scale calculations, GUI views, and data synchronization.

dev-tools

Flock Credibility Lost as It Repeatedly Lies to City Councils, Police, & Public

Flock Safety, a prominent provider of automatic license plate readers (ALPR), is facing a severe credibility crisis after being caught repeatedly misleading city councils, police departments, and the public about its surveillance capabilities and data practices.

dev-tools

Human mathematicians are being outcounterexampled

In just a few weeks, advanced AI models like ChatGPT Sol and Claude Fable have disproved long-standing mathematical conjectures by finding precise counterexamples and auto-formalizing them in Lean.