Porting Go's strings package to C

A technical deep dive into porting Go's core libraries like strings and bytes to C, addressing challenges in operator precedence and memory allocation strategies.

Creating a subset of Go that translates to C was never my end goal. I liked writing C code with Go, but without the standard library it felt pretty limited. So, the next logical step was to port Go's stdlib to C.

Of course, this isn't something I could do all at once. I started with the io package, which provides core abstractions like Reader and Writer, as well as general-purpose functions like Copy. But io isn't very interesting on its own, since it doesn't include specific reader or writer implementations. So my next choices were naturally bytes and strings — the workhorses of almost every Go program. This post is about how the porting process went.

Bits and UTF-8

Before I could start porting bytes, I had to deal with its dependencies first:

math/bits implements bit counting and manipulation functions. unicode/utf8 implements functions for UTF-8 encoded text.

Both of these packages are made up of pure functions, so they were pretty easy to port. The only minor challenge was the difference in operator precedence between Go and C — specifically, bit shifts (<<, >>). In Go, bit shifts have higher precedence than addition and subtraction. In C, they have lower precedence:

// Go: shift has HIGHER precedence than +
var x uint32 = 1<<2 + 3 // (1 << 2) + 3 == 7

// C: shift has LOWER precedence than +
uint32_t x = 1 << 2 + 3; // 1 << (2 + 3) == 32

The simplest solution was to just use parentheses everywhere shifts are involved.

Bytes

The bytes package provides functions for working with byte slices. Some of them were easy to port, like Equal. Here's how it looks in Go:

func Equal(a, b []byte) bool {
    return string(a) == string(b)
}

And here's the C version:

#define so_bytes_string(bs) ({ \
    so_Slice _bs = (bs); \
    (so_String){(const char*)_bs.ptr, _bs.len}; \
})

static inline bool so_string_eq(so_String s1, so_String s2) {
    return s1.len == s2.len &&
           (s1.len == 0 || memcmp(s1.ptr, s2.ptr, s1.len) == 0);
}

bool bytes_Equal(so_Slice a, so_Slice b) {
    return so_string_eq(so_bytes_string(a), so_bytes_string(b));
}

Just like in Go, the so_bytes_string macro doesn't allocate memory; it just reinterprets the byte slice's underlying storage as a string. The so_string_eq function is easy to implement using memcmp from the libc API.

Another example is the IndexByte function. I used a regular C for loop to mimic Go's for-range, using so_len and so_at (a bounds-checking macro) to ensure safety.

Allocators

The Go runtime handles memory allocation and deallocation automatically. In C, I had a few options: use a garbage collector like Boehm GC, use malloc/free, or introduce allocators.

Modern systems programming languages like Zig and Odin clearly showed the value of allocators: it's obvious whether a function allocates memory, it's easy to use different allocation methods, and it helps with testing and debugging. An Allocator is a struct with function pointers:

typedef struct {
    void* self;
    so_Result (*Alloc)(void* self, so_int size, so_int align);
    so_Result (*Realloc)(void* self, void* ptr, so_int oldSize, so_int newSize, so_int align);
    void (*Free)(void* self, void* ptr, so_int size, so_int align);
} mem_Allocator;

By convention, if a function allocates memory, it takes an allocator as its first parameter. So Go's Repeat translates to this C code:

so_Slice bytes_Repeat(mem_Allocator a, so_Slice b, so_int count)

If the caller doesn't care about using a specific allocator, they can just pass an empty allocator, and the implementation will use the system allocator — calloc, realloc, and free from libc.

Source: Hacker News