How to use storytelling to fit inline assembly into Rust

This article explores how inline assembly fits into the Rust Abstract Machine and introduces the 'storytelling' principle to ensure memory safety and avoid Undefined Behavior.

The Rust Abstract Machine is full of wonderful oddities that do not exist on the actual hardware. Inevitably, every time this is discussed, someone asks: âBut, what if I use inline assembly? What happens with provenance and uninitialized memory and Tree Borrows and all these other fun things you made up that donât actually exist?â This is a great question, but answering it properly requires some effort. In this post, I will lay down my current thinking on how inline assembly fits into the Rust Abstract Machine by giving a general principle that explains how anything we decide about the semantics of pure Rust impacts what inline assembly may or may not do.

Note that everything I discuss here applies to FFI calls just as much as it applies to inline assembly. Those mechanisms are fundamentally very similar: they allow Rust code to invoke code not written in Rust.1 I will not keep repeating âinline assembly or FFIâ throughout the post, but every time I refer to inline assembly this is meant to also include FFI.

To get started, let me explain why there are things that even inline assembly is fundamentally not allowed to do.

Why canât inline assembly do whatever it wants?

People like to think of inline assembly as freeing them from all the complicated requirements of the Abstract Machine. Unfortunately, thatâs a pipe dream. Here is an example to demonstrate this:

use std::arch::asm;
#[inline(never)]
fn innocent(x: &i32) { unsafe {
// Store 0 at the address given by x.
asm!(
"mov dword ptr [{x}], 0",
x = in(reg) x,
);
} }
fn main() {
let x = 1;
innocent(&x);
assert!(x == 1);
}

When the compiler analyzes main

, it realizes that only a shared reference is being passed to innocent

. This means that whatever innocent

does, it cannot change the value stored at *x

. Therefore, the assertion can be optimized away.

However, innocent

actually does write to *x

! Therefore, the optimization changed the behavior of the program. And indeed, this is exactly what happens with current versions of rustc: without optimizations, the assertion fails, but with optimizations, it passes. Therefore, either the optimization was wrong, or the program had Undefined Behavior. And since this is an optimization that we really want to be able to perform, we can only pick the second option.2

However, where does the UB come from? If the entire program was written in Rust, the answer would be âthe aliasing modelâ. Both Stacked Borrows and Tree Borrows, and any other aliasing model worth considering for Rust, will make it UB to write through pointers derived from a shared reference. However, this time, parts of the program are not written in Rust, so things are not that simple. How can we say that the inline asm block violated Tree Borrows, when it is written in a language that does not have anything even remotely comparable to Tree Borrows? Thatâs what the rest of this post is about.

I hope the example clearly demonstrates that we cannot get away with having inline assembly just ignore Abstract Machine concepts such as Tree Borrows. The inline asm block causes UB, we just have to figure out how and whyâand more importantly, we have to figure out how people can ensure that their inline asm blocks do not cause UB.

When is inline assembly compatible with optimizations?

It may seem like we now have to define a version of Tree Borrows that works with assembly code. That would be an impossible task (Tree Borrows relies on pointer provenance, which does not exist in assembly).3 Lucky enough, this is also not necessary.

Instead, we can piggy-back on the already existing definition of Tree Borrows and the rest of the Abstract Machine. We do this by requiring the programmer to tell a story about what the inline assembly block does in Rust terms.4 (If this sounds strange, please bear with me. I will explain why this makes sense.) Specifically, for every inline assembly block, there has to be a corresponding piece of Rust code that does the same thing ** as far as the state observable by pure Rust code is concerned**. When reasoning about the behavior of the overall program, the inline assembly block then gets replaced by that âstoryâ code. You donât have to actually write this code; whatâs important is that the code exists and tells a coherent story with what the surrounding Rust code does.

For our example above, this immediately explains what went wrong: the story code for the inline assembly block would have to be something like (x as *const i32 as *mut i32).write(0)

, and if we insert that code in place of the inline assembly block, we can immediately see (and Miri could confirm) that the program has UB. An inline assembly block can have many possible stories, and it is enough to find one story that makes everything work, but in this case, that is not possible.

So, in slightly more detail, here is what I consider to be the rules for inline assembly:

For every inline assembly block, pick a âstoryâ: a piece of Rust code that serves as stand-in for what this inline assembly block does to the Abstract Machine state. This story code only has access to data that is made available to the inline assembly block (explicit operands and global variables). When reasoning about soundness and correctness of the program on the Abstract Machine level, we pretend that the story code gets executed instead of the assembly code. - This piece of code has to satisfy all the requirements that are imposed on the asm block by attributes such as readonly

ornomem

and honor operand constraints such as not mutatingin

operands. - The actual assembly code has to refinethe story code, i.e., whatever the assembly code does to state which the Abstract Machine can observe (in particular, operands and global variables) has to be something that the story code could also have done.

I should add the disclaimer that I do not have a formal theory that proves correctness of this approach. However, I am reasonably confident, because this approach fits in very well with how we prove the correctness for optimizations such as the one in our example above: At the heart of the correctness argument is a proof that all Rust code satisfies some universal properties. For instance, we can formalize and prove the claim that any Rust function which takes a shared reference without interior mutability as argument cannot write to that argument. This isnât the only such property; in fact the set of such properties isnât fully known: we might discover a new property upheld by all Rust code tomorrow. Whatâs crucial is that any property of the form âfor all Rust programs, â¦â must also hold for the story code, since that is just regular Rust code! Finally, because the actual assembly code refines the story code, we know that for the purpose of reasoning about the program, we can pretend that actually the story code gets executed and then, at the end of compilation, replace the story code by the desired assembly code without changing program behavior.

So, that is why story code works. But, doesnât this make inline assembly entirely useless? After all, the entire point of inline assembly is to do things I couldnât already do in pure Rust!

Inline assembly stories by example

To convince you that the storytelling approach is feasible, let us consider a few representative examples for inline assembly usage and what the corresponding story might look like.

Pure instructions

The easiest case is code that wants to access a new hardware operation that is not exposed in the language. For instance, the inline assembly block might consist of a single instruction that returns the number of bits set to 1 in a register. Here, storytelling is trivial: we can just write some Rust code that does a bit of bit manipulation by hand to count the number of bits set to 1.