NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...2 min read

A tail-call interpreter in (nightly) Rust

Share
NOW LET US Article – A tail-call interpreter in (nightly) Rust

The author explores using the new 'become' keyword in nightly Rust to implement a tail-call interpreter for the Uxn CPU, achieving performance that surpasses even hand-coded ARM64 assembly.

A tail-call interpreter in (nightly) Rust

Last week, I wrote a tail-call interpreter using the become keyword, which was recently added to nightly Rust (seven months ago is recent, right?).

It was a surprisingly pleasant experience, and the resulting VM outperforms both my previous Rust implementation and my hand-coded ARM64 assembly. Tailcall-based techniques have been all the rage recently; consider this my trip report implementing a simple but non-trivial system.

For those keeping track at home, this is the latest in my exploration of high-performance emulation of the Uxn CPU, which runs a bunch of applications in the Hundred Rabbits ecosystem.

Basics of Uxn emulation

Uxn is a simple stack machine with 256 instructions. The whole CPU has just over 64K of space, split between a few memories:

  • Two 256-byte stacks, each with an index byte
  • 65536 bytes of RAM, which is used for both data and program text
  • A 2-byte program counter
  • 256 bytes of "device memory", used for peripherals

The simplest emulator reads a byte from RAM at the program counter, then calls into an instruction (which may update the program counter):

fn run(core: &mut Uxn, dev: &mut Device, mut pc: u16) -> u16 {
    loop {
        let op = core.next(&mut pc);
        let Some(next) = core.op(op, dev, pc) else {
            break pc;
        };
        pc = next;
    }
}

Threaded code in assembly

In our assembly implementation, we can instead use threaded code (specifically token threading). We store all of the CPU state in registers, then end each instruction with a jump to the subsequent instruction. This distributes the dispatch operation across every opcode, making it easier for the branch predictor to learn sequences of opcodes in the program. Overall speedups were significant: 40-50% faster on ARM64, and about 2x faster on x86-64.

Unfortunately, it requires maintaining about 2000 lines of code, and is incredibly unsafe.

Tail calls in Rust

We'd like to get the same behavior as our assembly implementation — VM state stored in registers, dispatch at the end of each opcode — without hand-writing every instruction in assembly.

The core idea involves two pieces:

  • Store program state in function arguments, which are mapped to registers based on your system's calling convention.
  • End each function by calling the next function.

Unfortunately, without explicit tail calls, even in a release build, the compiler may not optimize out the stack, leading to a stack overflow.

In nightly Rust, this is a one-word fix using the become keyword. With this change in place, the Rust compiler makes a guarantee: When tail calling a function, instead of its stack frame being added to the stack, the stack frame of the caller is directly replaced with the callee’s.

match core.inc::<FLAGS>(pc) {
    Some(pc) => {
        let op = core.next(&mut pc);
        become TABLE.0[op as usize](
            core.stack.data,
            core.stack.index,
            core.ret.data,
            core.ret.index,
            core.dev,
            core.ram,
            pc,
            vdev,
        )
    }
    None => (core, pc)
}

That's it, everything works! The Rust compiler generates a br (branch to register) instead of a bl (branch-and-link) instruction and does not allocate any persistent space on the stack.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

NOW LET US Related – Open source AI must win

dev-tools

Open source AI must win

If artificial intelligence becomes a utility rented only from a few closed institutions, humanity loses its operational freedom. Open-source AI is a vital infrastructure for the future of our digital society.

NOW LET US Related – Statement on US government directive to suspend access to Fable 5 and Mythos 5

dev-tools

Statement on US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive forcing Anthropic to suspend all access to its Fable 5 and Mythos 5 models due to national security concerns, a move the AI safety startup strongly disputes.

NOW LET US Related – Electric motors with no rare earths

dev-tools

Electric motors with no rare earths

Renault Group is pioneering the development of electrically excited synchronous motors (EESM) that eliminate the need for rare earth magnets, reducing dependency on global monopolies while driving efficiency and sustainability.

NOW LET US Related – Swift at Apple: Migrating the TrueType hinting interpreter

dev-tools

Swift at Apple: Migrating the TrueType hinting interpreter

Apple has rewritten its TrueType hinting interpreter from C to memory-safe Swift for its Fall 2025 OS releases, improving security and boosting performance by an average of 13%.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.