A story about how I dug into the PostgreSQL sources to write my own WAL receiver

A deep dive into the journey of exploring PostgreSQL's C source code to understand pg_receivewal and the process of building a custom WAL receiver using Go.

A Long Story about how I dug into the PostgreSQL source code to write my own WAL receiver, and what came out of it

Some thoughts are unpredictable.

For example:“I wonder how pg_receivewal works internally?"

From the outside, it sounds almost innocent. Really, what could possibly be wrong with that? Just ordinary engineering curiosity. I will take a quick look, understand the general structure, satisfy my curiosity, and then go on living peacefully.

But then, for some reason, this happens: you are already building PostgreSQL from source, digging into receivelog.c, comparing the behavior of your little creation with the original step by step, arguing with fsync, looking at .partial files like old friends, and suddenly discovering that you are writing your own WAL receiver.

In short, everything started quite normally and with absolutely no signs of anything serious.

Why PostgreSQL in the First Place

I have been using PostgreSQL as the main DBMS in almost all of my projects for a long time — both personal and work-related. And the longer you work with it, the more clearly you understand: this is not just a “good database”. This is a system designed by people with a very serious engineering culture.

When you read notes, discussions, and articles from PostgreSQL developers, you quickly notice how deeply they think through changes, trade-offs, new features, and behavior in complex scenarios. After such materials, I usually had a mixed feeling:

admiration
respect
and a slight feeling that I had once again looked at work of a level unreachable for me

PostgreSQL gives you everything you need out of the box for backups and continuous WAL archiving. Including pg_receivewal - the utility that eventually set everything in motion for me.

Why Exactly `pg_receivewal`

Because it is a very good utility. And good utilities are especially dangerous: they make you want to understand exactly how they are built.

pg_receivewal continuously receives WAL segments, can work in synchronous and asynchronous replication modes, and in general looks fairly straightforward. From a distance.

Up close, it turns out that there are quite a few subtle things there:

how the main loop starts
how connection drops are survived
how restart is performed
at what point .partial becomes a complete WAL file
how timeline switching is handled
where and when important fsync calls must happen
what to do so that it is reliable, not slow, and not embarrassing

So, as usual: a simple utility with a decent amount of engineering accuracy hidden around it.

A Few Words About Other Good Solutions I Looked at With Respect and Envy

Before writing something of my own, of course, I spent a lot of time looking at already existing solutions. I use two of them at work for continuous archiving of the most critical and main databases.

pgBackRest

pgBackRest is, without exaggeration, an engineering tank. Everything in its source code is impressive: logging, testing, architectural discipline, incremental and differential backups, support for large installations, and attention to edge cases. When you read the code of this tool, you catch yourself thinking: yes, this is what a product written by people who know what they are doing looks like.

Barman

I like Barman for a different reason. It does not try to magically solve everything in the world. It is, essentially, a very understandable orchestrator around standard PostgreSQL tools: pg_receivewal and pg_basebackup. It has a quality that I value a lot: a simple and reliable model.

Why Go, If I Had to Look at So Much C

I decided to write my tool in Go. The reasons are fairly ordinary: concise language, simplicity, UNIX background, convenience for network/system-level things, and good concurrency handling.

But there is an important nuance: to understand PostgreSQL, I had to seriously dig into C code. C is, in my opinion, both the most difficult and the most brilliant language at the same time. Syntax is nothing — semantics are everything. Pointers alone hide a whole chain of icebergs underneath. The C language is so direct and honest that it becomes scary. One pointer going the wrong way and you face a Segmentation Fault.

So formally I wrote the tool in Go, but in practice this project also became my way of touching C a little more deeply.

The Beginning: Compiling PostgreSQL, Debugging, and the First Signs of Recklessness

To understand the implementation details at all, I had to go into the PostgreSQL source code. I had to learn how to build PostgreSQL from source, run it in debug mode, and attach a debugger to watch how calls flow.

At first, I added the most aggressive tracing possible. Then the realization comes: many logs do not mean much understanding. But at this stage, the overall picture started to emerge. I began to understand how entities are connected and where the WAL receiving loop starts. And at some point I could not resist: enough watching, time to write.