Building a Shell

A developer's first-person account of building a simple shell from scratch, covering the implementation of core features like the REPL, command execution, built-ins, variable expansion, and pipelines.
The shell sits in front of a lot of my work, but I mostly use it for the outcome: running unix commands and scripts, creating branches and making commits. Unlike when I'm writing code, I'm rarely thinking about how the shell itself works under the hood.
So, to dig a bit deeper into shells, I'm going to build a toy one until I run out of time. I have a fresh pot of filter coffee, and I'm awake three hours before everyone else.
A quick look ahead to everything I'm able to support by the end:
./andshandsh$ cd /andsh$ pwd/andsh$ echo $HOME/Users/andrewandsh$ nosuchcommandnosuchcommand: No such file or directoryandsh$ echo $?127andsh$ printf abc\n | tr a-z A-Z | revCBAandsh$ ec<Tab> helloandsh$ echo hellohelloandsh$ <Up>andsh$ echo hellohelloandsh$ ^D
If you prefer reading C over prose, head straight to healeycodes/andsh.
REPL
A shell is an interactive program before it's a language implementation, and the user experience starts at the prompt. This first step is about building the interactive skeleton: print a prompt, read a line, keep a little state, and leave a clean place to plug executio logic into.
// repl.htypedef struct {int last_status;int running;int interactive;} Shell;
We also need the classic read-eval-print loop:
// repl.cint shell_run(Shell *shell) {char *line = NULL;size_t capacity = 0;if (install_signal_handlers() != 0) {return 1;}while (shell->running) {int rc = read_line(&line, &capacity, shell);if (rc == 0) {break;}if (rc < 0) {free(line);return 1;}eval_line(shell, line);}free(line);return shell->last_status;}
read_line
returns three cases: got a line, hit EOF, or hit a real error.
eval_line
starts tiny: blank lines do nothing, exit
stops the shell in-process, and everything else gets treated as an external command.
// inside eval_lineif (strcmp(argv[0], "exit") == 0) {shell->running = 0;free_argv(argv);return shell->last_status;}status = execute_external(shell, argv);
At the moment, we can run ls
but we can't run ls -l
yet. It's interpreted as a single command "ls -l"
.
From a Line to argv
Before we add env var expansion and pipes, let's start by splitting a line on spaces and tabs so we can run simple foreground commands like echo hello world
or ls -l
.
It will be intentionally incomplete. It still won't handle quotes or redirections, but it will peel off |
as syntax so we can grow into supporting pipelines later. It's still useful because Unix process APIs want argv
(argument vector, values passed down to a program when it starts).
First, we need a way to split a line:
// repl.cstatic char **tokenize_line(const char *line, int *count_out) {while (*p != '\0') {while (isspace((unsigned char) *p)) {p++;}if (*p == '|') {push_word(&words, &count, &capacity, dup_range(p, 1));p++;continue;}// .. copy the next word up to whitespace or |}*count_out = (int) count;return words;}
Which we can call inside our fledgling eval_line
function to get a stream of shell words before we group them into commands.
// inside eval_lineif (line_is_blank(line)) {return 0;}words = tokenize_line(line, &word_count);if (word_count == 0) {free_words(words);return 0;}
Running Commands
A shell can't replace itself with a command that it's launching (otherwise the shell would cease to exist after running that command) so it must create a child process to run the command, and wait for it to finish.
The parent shell stays alive and the child process becomes the command.
execvp
is a convenient call from the exec
family here. It searches PATH
and replaces the current process with a new program, using the current process environment.
waitpid
gives control back to the shell after the command exits.
// repl.cpid = fork();if (pid == 0) {execvp(argv[0], argv);perror(argv[0]);// 127: command not found, 126: found but not executable / cannot invoke_exit(errno == ENOENT ? 127 : 126);}// ..while (waitpid(pid, &status, 0) < 0) {if (errno != EINTR) {perror("waitpid");shell->last_status = 1;return shell->last_status;}}
The child uses _exit
to avoid running parent-inherited libc cleanup in the forked child (can lead to duplicated output and other unintended side effects).
One shell-y detail I wanted to keep was the interrupted wait path. Retrying on EINTR
keeps the shell from losing track of a child process when the terminal sends an interrupt.
Now we can do real shell things:
./andshandsh$ echo hello worldhello worldandsh$ pwd/Users/andrew/Documents/experiments/andshandsh$ ls -ltotal 160-rw-r--r-- 1 andrew staff 194 14 Mar 08:10 Makefiledrwxr-xr-x 7 andrew staff 224 14 Mar 14:24 srcandsh$ ^D
For the process/system call stuff, C is great for writing toy shells. The downsides are things like splitting a line (managing dynamic memory), and later, adding more shell syntax (string lifetimes).
cd, or How to Get around
One of the core shell rules is that some commands can't run in a child process. For example, if the shell forks and a child calls chdir
then only the child changes directories; when the child exits, the parent shell is still in the old directory.
This is why cd
has to be a builtin.
// inside try_builtinif (strcmp(command->argv[0], "cd") == 0) {return run_builtin_cd(shell, command);}
Something I learned for this post is that HOME
is the conventional default target when running a lone cd
.
static int run_builtin_cd(Shell *shell, Command *command) {const char *target = command->argc == 1 ? getenv("HOME") : command->argv[1];if (chdir(target) != 0) {perror("cd");shell->last_status = 1;return shell->last_status;}shell->last_status = 0;return 0;}
Because run_builtin_cd
runs inside the shell process, the next prompt sees the new directory.
Env Var Expansion
Before running a command, the shell rewrites parts of the input line. There are a few syntax rules and ordering details here, but for my toy shell I'm just adding env var expansion.
echo $HOME
shouldn't print $HOME
, it should print /Users/andrew
.
I'm just hacking this in. Only whole-word $NAME
expansion. No quotes, no ${NAME}
, and no splitting rules.
static char *expand_word(const Shell *shell, const char *word) {const char *value;if (strcmp(word, "$?") == 0) {char status[32];snprintf(status, sizeof(status), "%d", shell->last_status);return strdup(status);}if (word[0] != '$' || word[1] == '\0') {return strdup(word);}// .. look up NAME in the environmentvalue = getenv(word + 1);if (value == NULL) {// Unset variables expand to the empty string in this toy shell.return strdup("");}return strdup(value);}
Expansion happens after tokenization but before execution. And |
is syntax, not data, so we don't try to expand it:
for (i = 0; words[i] != NULL; i++) {char *expanded;if (strcmp(words[i], "|") == 0) {continue;}expanded = expand_word(shell, words[i]);free(words[i]);words[i] = expanded;}
We expand token-by-token, keeping it simple; and skipping writing a parser.
The special case for $?
is also nice to leave in the code because it's one of those tiny shell details that makes the prompt feel less fake.
Piping
A pipe (|
) is a kernel buffer with one process writing bytes in and another reading bytes out.
cmd1 | cmd2
connects the stdout of the left command to the stdin of the right command. For N
commands, you need N - 1
pipes.
The heavy lifting here will be done by pipe()
, which creates a one-way channel for interprocess communication. pipe()
fills the array pipefd
with two file descriptors. pipefd[0]
is the read end, and pipefd[1]
is the write end. Data written to the write end is buffered by the kernel until it is read from the read end.
The core pipe loop runs once per command in the pipeline. Each iteration may create one new pipe for the next command. prev_read
is the read end carried forward from the previous iteration.
for (i = 0; i < pipeline->count; i++) {int pipefd[2] = {-1, -1};if (i + 1 < pipeline->count) {pipe(pipefd);}pid = fork();if (p
Source: Hacker News










