NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...3 min read

I imported the full Linux kernel git history into pgit

Share
NOW LET US Article – I imported the full Linux kernel git history into pgit

A deep dive into importing 1.4 million Linux kernel commits into pgit, a SQL-based Git alternative. The post details the hardware setup, OS tuning, and PostgreSQL configurations required to handle one of the world's largest repositories.

TL;DR: Imported the full Linux kernel history into pgit. 1,428,882 commits, 24.4 million file versions, 20 years of development, stored in PostgreSQL with delta compression. Actual data: 2.7 GB (git gc --aggressive gets 1.95 GB). The import took 2 hours on a dedicated server. Then I started asking questions. 7 f-bombs in 1.4 million commit messages (all from 2 people). 665 bug fixes pointing at a single commit. A filesystem that took 13 years to merge. Here's what the Linux kernel looks like as a SQL database.

The import

This post builds on pgit: What If Your Git History Was a SQL Database?. If you haven't read it, start there. Short version: pgit is a Git-like CLI where everything lives in PostgreSQL instead of the filesystem. It uses pg-xpatch for transparent delta compression and makes your entire commit history SQL-queryable. After the pgit post hit the HN front page and got picked up by TLDR, console.dev, and dailydev, I teased that I was importing the Linux kernel. Here's what happened.

The Linux kernel is one of the largest actively developed repositories in the world. 1.4 million commits spanning 20 years, 171,000 files, 38,000 contributors. From what I've found, only a handful of VCS besides git have ever managed a full import of the kernel's history. Fossil (SQLite-based, by the SQLite team) never did. Darcs and Monotone attempted it with severe performance problems. Mercurial can do it. Correct me if I'm wrong on any of this.

pgit handled it.

| Metric | Value | |---|---| | Commits | 1,428,882 | | File versions (file refs) | 24,384,844 | | Unique blobs | 3,089,589 | | Unique paths | 171,525 | | Path groups (delta chains) | 137,600 | Import time | 2h 0m 48s |

The import ran on a Hetzner dedicated server in Finland: AMD EPYC 7401P (24 cores / 48 threads), 512 GB DDR4 ECC RAM, 2Ã1.92 TB SSD in RAID 0. With a 350 GB xpatch content cache, the entire decoded repository fits in memory.

Full server setup, git baseline, and pgit configuration

The server

Hetzner Dedicated "Server Auction" from their Finland datacenter (HEL1):

| Component | Spec | |---|---| | CPU | AMD EPYC 7401P (24 cores / 48 threads) | | RAM | 16Ã32 GB DDR4 ECC reg. (512 GB total) | | Storage | 2ÃMicron SSD SATA 1.92 TB Datacenter (RAID 0) | | NIC | 1 Gbit Intel I350 | | Cost | ~â¬272/month |

OS installation

Hetzner installimage with Ubuntu 24.04 LTS. Two changes from the default config: RAID 0 (SWRAIDLEVEL 0) for maximum throughput (no redundancy needed for ephemeral analysis work), and a simple partition layout:

PART /boot ext3 1024M
PART swap swap 4G
PART / ext4 all

This gives ~3.5 TB usable storage across the two 1.92 TB SSDs.

OS tuning

After booting into the installed image:

# --- Packages ---
apt update && apt upgrade -y
apt install -y tmux btop htop iotop cpufrequtils numactl git curl wget unzip build-essential ufw linux-tools-common linux-tools-$(uname -r)
# --- CPU governor â performance (all 48 threads) ---
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo performance > "$cpu"; done
# --- Kernel mitigations off ---
sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0"/GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 mitigations=off"/' /etc/default/grub.d/hetzner.cfg
update-grub
# --- sysctl ---
cat >> /etc/sysctl.conf << 'EOF'
vm.swappiness = 1
vm.dirty_ratio = 5
vm.dirty_background_ratio = 2
kernel.numa_balancing = 1
EOF
sysctl -p
# --- Disable Transparent Huge Pages ---
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
# --- noatime ---
sed -i 's|relatime|noatime|g' /etc/fstab
mount -o remount,noatime /

pgit configuration

pgit config --global container.shared_buffers 64GB
pgit config --global container.effective_cache_size 400GB
pgit config --global container.xpatch_cache_size_mb 358400 # 350 GB
pgit config --global container.max_worker_processes 28
pgit config --global import.workers 24

Configuration rationale

| Parameter | Value | Reasoning | |---|---|---| | shared_buffers | 64 GB | Dataset ~20 GB on disk |

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

NOW LET US Related – Open source AI must win

dev-tools

Open source AI must win

If artificial intelligence becomes a utility rented only from a few closed institutions, humanity loses its operational freedom. Open-source AI is a vital infrastructure for the future of our digital society.

NOW LET US Related – Statement on US government directive to suspend access to Fable 5 and Mythos 5

dev-tools

Statement on US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive forcing Anthropic to suspend all access to its Fable 5 and Mythos 5 models due to national security concerns, a move the AI safety startup strongly disputes.

NOW LET US Related – Electric motors with no rare earths

dev-tools

Electric motors with no rare earths

Renault Group is pioneering the development of electrically excited synchronous motors (EESM) that eliminate the need for rare earth magnets, reducing dependency on global monopolies while driving efficiency and sustainability.

NOW LET US Related – Swift at Apple: Migrating the TrueType hinting interpreter

dev-tools

Swift at Apple: Migrating the TrueType hinting interpreter

Apple has rewritten its TrueType hinting interpreter from C to memory-safe Swift for its Fall 2025 OS releases, improving security and boosting performance by an average of 13%.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.