Memory Safe Inline Assembly

Fil-C introduces support for memory-safe inline assembly, preventing common miscompilation bugs in C/C++ while preserving programmer intent and performance.

NOTE: This is a pre-release feature. The Fil-C 0.679 release does not ship with this feature. To test this feature, you need to build from source.

GCC and clang both support an incredibly powerful inline assembly syntax. For example:

unsigned rotate(unsigned x, unsigned char c)
{
asm("roll %1, %0" : "+r"(x) : "c"(c) : "cc");
return x;
}

Instructs the compiler to emit assembly based on the roll %1, %0 template, where %1 is filled in with %cl, %0 is filled in with whichever register holds x, and c is moved into the %ecx register just before the roll instruction. Additionally, the compiler is told that the instruction will change the value of x and change the value of control flags.

This seems like it cannot possibly be safe! What if the programmer did something wrong, like omitted the + in "+r", or forgot the the "cc" clobber? In Yolo-C, if you make such a mistake, the compiler happily miscompiles your code in those cases.

Yet Fil-C supports this inline assembly syntax and it's completely safe!

This document explains why Fil-C supports inline assembly at all and then goes into the details of how that support is achieved while maintaining both programmer intent (you still get the assembly template you asked for) and complete memory safety (if you do something wrong, you'll panic or get an illegal instruction trap, at worst).

Why Inline Assembly?

While reviewing folks' C and C++ code, I've found the following reasons for inline assembly, where 1 is most common:

Blank inline assembly to prevent compiler analysis. This includes things like asm volatile("" : : : "memory"), which is an old-school way of saying atomic_signal_fence(memory_order_seq_cst). It works because we're telling the compiler that the inline assembly clobbers all memory, which forces the compiler to serialize memory accesses, just like a signal fence would have. The contract with the compiler is clear: the compiler must emit exactly the assembly we're asking it to emit (which is blank here) without second-guessing our claims about the clobbers. That is, the compiler must not infer that because the assembly is blank then there cannot be a memory clobber. We said memory clobber, so that's what the compiler sees. Similarly, folks do stuff like asm("" : "+r(x)). This means: the assembly may read and then write x. The assembly is blank, so this incurs no cost other than forcing the compiler to assume that it doesn't know anything about x's value after the assembly executes. This kind of data flow fence is useful for writing constant-time crypto. Fil-C has long supported blank inline assembly since it's trivially safe. Fil-C even supports "+r" constraints on pointers, in which case both the intval and lower are threaded through their own "+r"-like constraints at the LLVM IR level.

cpuid and xgetbv. The inline assembly snippets for these two instructions occur most often in code that then goes on to use SIMD intrinsics. I think this is because the __get_cpuid API in cpuid.h is confusing to use and, as far as I can tell, does not work right in either GCC or clang. Hence, packages like zstd, simdutf, simdjson, and other SIMD-using programs tend to identify CPU features by using inline assembly that invokes cpuid. They often also use inline assembly to invoke xgetbv as well. In Fil-C, __get_cpuid is fixed, so you could use that, and zxgetbv is offered as an intrinsic. However, it's better to support those inline assembly snippets without requiring folks to change their code! And there's nothing unsafe about invoking cpuid and xgetbv so long as the code specifies the right clobbers and constraints.

Arithmetic over secrets in crypto code. A great example is OpenSSH's sntrup761 implementation, which wraps key arithmetic in inline assembly to ensure that it gets exactly the right instruction and not some instruction that might have varying execution time depending on inputs. Note that this kind of code often has fallbacks to try to get the compiler to emit constant-time code even if inline assembly is not supported, but those fallbacks are unlikely to be as rigorously validated, and often rely on "optimization blocking" idioms that hurt performance and could be circumvented by a sufficiently clever compiler. Hence, it's safest to support inline assembly snippets that do this. Luckily, these snippets are also completely safe, provided that the constraints and clobbers are correct.

Atomics. Compilers have long supported intrinsics for atomic instructions. Compilers also have a long history of implementing these intrinsics incorrectly! Most recently, clang had bugs in how it lowered CAS to LL/SC on ARM64. Hence, serious lock-free programmers tend to write their atomic instructions using inline assembly at least some of the time, like in those cases where they had encountered a miscompile and so dropping to assembly was their only path to fixing the bug. Supporting atomics in inline assembly would require allowing inline assembly that accesses memory, which would mean somehow inferring what Fil-C bounds checks to do. Inline assembly that accesses memory is currently out of scope. However, memory-safe inline assembly does support fences (lfence, sfence, mfence, and serialize).

System calls. These are currently out of scope for inline assembly in Fil-C, and that's fine, since using inline assembly for syscalls is only necessary in the guts of libc implementations. Fil-C already has ports of musl and glibc, and in both cases the inline assembly for syscalls is replaced with calls to the pizlonated_syscalls.h API that Fil-C provides. However, I can imagine adding support for inline assembly that does syscalls in the future, to make it easier to port new libc's to Fil-C.

x87 long double functions. If you're working with long double on x86, then you're using the x87 80-bit floating point math. If you want access to the x87 FPU's implementations of various math functions, then often the best way to do that is to drop to inline assembly. This is totally safe, provided that the inline assembly doesn't push or pop the x87 stack, and the constraints correctly spell out which x87 stack registers were clobbered.

It's likely that folks use inline assembly for other purposes, but the above list is all that I've seen when surveying programs in the Linux userland.

To summarize:

There remain many legitimate uses of inline assembly.
Inline assembly use is widespread in C and C++ libraries. You're probably using multiple of those libraries right now as you're reading this post, and the inline assembly in those libraries is on the critical path.
Much of the inline assembly is trivially safe: it doesn't access memory, it does no control flow, and the instructions used have no other sneaky side effects.

Read on for details about the world's first memory safe inline assembly implementation!

Supporting Inline Assembly Safely

When the Fil-C compiler's safety instrumentation pass (called FilPizlonator) runs, inline assembly is present in LLVM IR as a pair of strings:

The assembly string, almost exactly like it appears in the C source code, just with some characters replaced. For example, the roll example turns into roll $1, $0.

The constraint string. This uses an LLVM-specific syntax to express the constraints and clobbers. For the roll example, this is =r,{cx},0,~{cc},~{dirflag},~{fpsr},~{flags}.

Hence, we can validate if an inline assembly expression is safe by:

Parsing and analyzing the assembly. If it contains memory accesses, control flow, or anything we don't recognize, we reject it.

Parsing and analyzing the constraints. If those do anything we don't recognize or support, then reject.

Ensuring that the assembly's effects are fully captured by the constraints. For example, if an assembly instruction modifies a regis

Source: Hacker News