Building Slogbox

A deep dive into the design decisions behind Slogbox, a high-performance Go slog.Handler that uses a fixed-size ring buffer to keep recent logs in memory for debugging and health checks.

Go 1.25 shipped runtime/trace.FlightRecorder, a circular buffer for execution traces. The concept is clean: keep recent data in memory, snapshot on demand, throw away what’s old. But runtime/trace captures goroutine scheduling and GC pauses. I wanted the same idea for structured logs.

So I built slogbox: a slog.Handler backed by a fixed-size ring buffer. You wire it up like any other handler, and it keeps the last N log records in memory for health checks, debug endpoints, or black box recording.

This post walks through every design decision. Not the “how to use it” guide (the README covers that) but the “why is it shaped this way” journal. Every choice was a trade-off, and I think the trade-offs are more interesting than the final code.

The ring buffer

The first question: how do you store the last N records efficiently?

The naive approach is append to a slice, then truncate when it gets too long. This works, but every truncation either copies elements or lets the backing array grow unbounded. In a handler that runs on every log call, that means allocations on the hot path and GC pressure you don’t need.

The solution is a pre-allocated slice with modulo arithmetic:

type recorder struct {
mu sync.RWMutex
buf []slog.Record
head int // next write position
count int // records stored (max = len(buf))
total uint64 // monotonic write counter
flushOn slog.Leveler
flushTo slog.Handler
lastFlush uint64 // value of total claimed by the last flush
maxAge time.Duration
}

buf is allocated once in New() with exactly the capacity you asked for. Writes go to buf[head], then head advances with wraparound:

c.buf[c.head] = nr
c.head = (c.head + 1) % len(c.buf)
if c.count < len(c.buf) {
c.count++
}
c.total++

No slice growth, no append, no copy on write. The only allocation that matters is the slog.Record itself, which the caller already created.

Reading the buffer back in order requires handling the wraparound. If the buffer isn’t full, records sit in buf[0:count]. If it is full, the oldest record is at head (it’s about to be overwritten next) and we need to read from head to end, then from start to head:

func (c *recorder) snapshotAll() []slog.Record {
out := make([]slog.Record, c.count)
if c.count < len(c.buf) {
copy(out, c.buf[:c.count])
} else {
n := copy(out, c.buf[c.head:])
copy(out[n:], c.buf[:c.head])
}
return out
}

The snapshot allocates once (the output slice) and uses copy, which is about as fast as Go gets for moving contiguous memory. The benchmarks confirmed this works: Handle runs at ~150 ns/op with 1 alloc on the hot path (the alloc comes from resolving and merging attributes into the stored record, not from the ring buffer itself).

Storing records, not strings

What should the buffer actually hold? The obvious choice is to format each record at write time (to JSON, text, whatever) and store the string. Then reads are trivial: concatenate the strings.

The buffer stores raw slog.Record values instead. Callers choose the serialization format at read time.

The reasoning: writes happen on every log call. Reads happen when someone hits /debug/logs or when an error triggers a flush. This is a write-heavy, read-rarely system. Formatting on the write path does work that gets thrown away as records rotate out of the buffer. Worse, it bakes in a format decision. If you stored JSON strings but later want to filter by level or grep by message, you’d have to unmarshal what you just marshaled.

Storing raw records means the buffer holds heavier values (a slog.Record has time, level, message, PC, and attrs). But the flexibility matters: you only pay serialization cost when someone actually looks at the data. For health check endpoints that fire once every 30 seconds, this is the right trade-off.

Resolving values at Handle time

Here’s a correctness issue that’s easy to miss. slog.LogValuer lets you attach dynamic values to log attributes. A LogValuer is resolved by calling its LogValue() method, and the result can change over time. Think of a struct that returns its current state.

If you store the raw attr and resolve it later (at serialization time), you capture the state at read time, not at log time. That’s a bug. The record says “this happened at 14:03:02” but the attribute value reflects what the struct looked like at 14:05:17 when someone hit the debug endpoint.

The fix is to resolve eagerly in Handle. This resolution isn’t limited to Handle. WithAttrs applies the same eager resolution via a resolveAttrs helper, so handler-level attrs passed through logger.With(...) are also captured at registration time, not at log time. Consistency matters: if record-level attrs resolve eagerly but handler-level attrs resolve lazily, you get different snapshot semantics depending on where the attr was attached.

The locking model

A slog.Handler gets called from any goroutine. You can have 32 goroutines logging simultaneously while a health check endpoint reads the buffer. The question: how to handle concurrent reads and writes without killing performance?

The key insight is asymmetry. Writes are the hot path. Every log call goes through Handle, potentially thousands of times per second. Reads are the cold path: a health check endpoint, a debug dump, maybe a flush on error. Optimizing for writes at the expense of reads is the correct call.

sync.RWMutex fits this perfectly. Writers take an exclusive lock, but they only hold it for the handful of instructions that update the ring buffer. Readers share a read lock, and they only hold it long enough to snapshot the buffer (a make and copy). The actual work of serializing, filtering, or streaming happens after the lock is released.

The natural Go instinct is to reach for a channel instead. Send records to a goroutine that owns the buffer, let it serialize access without explicit locks. The problem is latency. A channel-based design means every Handle call does a channel send, which involves goroutine scheduling: the sender blocks until the receiver dequeues, and the context switch overhead adds up.

Source: Hacker News