An x86-64 back end for raven-uxn

The author successfully ported the raven-uxn CPU implementation to x86-64 assembly using Claude Code, achieving twice the speed of the Rust version. This post explores the process of using AI agents for low-level systems programming and the importance of robust testing.

An x86-64 backend for `raven-uxn`

Uxn is a fictional CPU, used as a target for various applications in the Hundred Rabbits ecosystem. It's a simple stack machine with 256 instructions:

My implementation of the Uxn CPU now has an x86-64 assembly implementation, which is about twice as fast as my Rust implementation. This required porting about 2000 lines of ARM64 assembly to x86-64, which was accomplished with the help of a robot buddy.

Let me provide a little more context.

A few years back, I wrote a Rust implementation of the CPU and peripherals, which was 10-20% faster than the reference implementation. For more background info, see that project's writeup:

The Rust implementation is fast, but suffers from the usual downsides of a bytecode-based VM: the main dispatch statement is an unpredictable branch.

I then wrote an assembly implementation of the interpreter, which proved to be about 30% faster than the Rust version. This was hard: it took several days of work, and there were lingering bugs that I didn't discover until I added a fuzz tester to check for discrepancies between the Rust and assembly implementation.

The assembly implementation is written for an ARM64 target, for two reasons:

I'm working on an ARM Macbook
Writing ARM assembly by hand is a fun intellectual exercise because the ISA is pleasantly orthogonal and well-organized, while x86 assembly is... less so

My blog post about the assembly implementation concludes with an optimistic statement:

On a brighter note, it should be relatively easy to port all of the assembly code to x86-64, but I'll leave that as a challenge for someone else!

I wrote that back in late 2024, and no one had yet risen to the challenge, so I decided to do it (kinda) myself. Because this is early 2026, you may know where this is going: the first draft was written autonomously by Claude Code.

Yes, that's right – it's finally my turn to test out the hip new coding agents on a problem that I know relatively well.

(This blog post was 100% written by me, a fleshy human, because I think that passing off AI-written text as human-authored is an insult to the reader)

How did it do?

In short, it did a great job of going from "zero to one": if I was given a blank text editor and asked to write the x86 implementations of every Uxn opcode, I would have done much worse.

The resulting implementation worked, passing both my unit tests and the fuzzer.

This was all basically autonomous: I deliberately did not help the agent with any implementation or debugging details, limiting my feedback to high-level strategy.

The assembly itself was of middling quality – and I then spent a while improving it – but the agent provided an invaluable boost of momentum to kick off the work.

The whole thing cost about $29, billed through an enterprise plan. I'm not sure how this would have gone with an unmetered plan, e.g. whether I would have hit usage limits midway through the process.

The implementation took a few hours of work, but only 15-20 minutes of hands-on time; the main speed limit was me noticing that it was waiting for approval to run a new command.

(This was all running on a disposable Oxide Computer VM, so I probably should have just run it with --dangerously-skip-permissions)

The implementation process

I started by giving the agent an overview of the problem and a description of my existing implementation:

The raven-uxn project implements a fictional CPU. There are two implementations: a safe Rust implementation, and a native code implementation. In the native implementation, we have hand-written assembly functions for each of the 256 opcodes, written with tail recursion so each instruction jumps to the next instruction. This is fast because there's no big case statement dispatching. However, the x86 implementation isn't yet working. Get it working: it should build with cargo build --features=native.

It successfully added an x86 assembly backend and got it compiling, which required a few rounds of tweaking the assembly syntax and re-running cargo build. At this point, I told the agent how to run unit tests:

Now that it's building, it should pass tests with cargo test -praven-uxn --features=native.

The agent found a bunch of failing unit tests, and fixed them mostly-autonomously. My contribution at this stage was telling it to fix one opcode at a time, because it was trying to devise a Grand Plan to fix every single failing test in one pass, and going off the rails. Because failures typically presented as SIGSEGV instead of assertion errors, I also instructed the agent to run tests one by one – otherwise, it was impossible to tell which specific opcode test caused the segfault.

Finally, once tests were passing, I told the agent how to run the fuzzer:

Great, now let's test it with fuzzing. You can run cargo +nightly fuzz run --release fuzz-native -- -max_total_time=10 -verbosity=0 to fuzz for 10 seconds and see if it finds anything.

A bunch more failures cropped up at this point, and it continued to debug them autonomously. Interestingly, at this point it started writing ad-hoc gdb scripts to probe the interpreter's internal state:

break _DEI
commands
info registers rsp rbp
x/4gx $rsp+0xd0
x/4gx $rsp+0xd8
x/4gx $rsp+0xe0
cont
end

break raven_uxn::native::dei_entry
run --test-threads=1 dei
info registers rdi rsi
x/4gx $rsi
p *(raven_uxn::native::DeviceHandle*)$rsi

The internal thinking traces are a fascinating artifact; the thing debugs like a goldfish with logorrhea:

The comment says "orig - 8 - 48 - 0x98 = orig - 0xd0" and then "[orig+8] = [rsp + 0xd0 + 8] = [rsp + 0xd8]". But this is wrong. If "orig" = rsp BEFORE the call (i.e., rsp_caller), then: 7th arg is at [orig - 8] BEFORE the call (pushed last by caller)... no, in SysV x86_64:

Actually the comment's definition of "orig" is inconsistent. Let me figure out what "orig" means in the comment:

So "orig" is the reference point where the return address sits at [orig - 8], the 7th argument at [orig + 8], and the 8th argument at [orig + 16]. This matches the x86_64 SysV ABI where after the call instruction, the stack layout has the return address at the top and the caller's arguments positioned above it. I'm working through the stack frame layout and realizing the comment's offset claims don't align with the actual memory addresses—the math just doesn't check out. But from the debugger output, I can see the 7th argument (vm) is actually at [rsp+0xd...], which gives me the concrete data I need to move forward.

After a bunch more round-trips through the fuzzer, it stopped finding bugs and just started running the fuzzer for longer and longer periods. After 60 seconds of fuzzing with no hits, I declared success.

Why did this work?

This won't be a surprising sentiment if you've read blog posts of this nature: the agent worked well because there was a comprehensive test suite and a fuzzing harness, so it could easily close the loop.

The first implementation did not compile; once it compiled, it did not pass unit tests; once it passed unit tests, it did not pass fuzz testing. Having all of these layers of (machine-checkable) tests was necessary to get a fully working implementation.

I suspect it also worked because the problem is translation flavored: there was a full ARM64 assembly implementation, and translating from one assembly flavor to another is easier than writing it from a high-level specification (or even from the Rust code).

How was the code?

I'm not an x86 assembly expert, but even I could tell that there were a few questionable decisions. Let me give you a few examples.

Claude seemed to get caller / callee registers confused: it properly handled callee-saved registers in the function prologue and epilogue, but also insisted on saving them before doing a call to an external function. This increased stack usage and added a bunch of unnecessary instructions.

Source: Hacker News