Show HN: MacMind – A transformer neural network in HyperCard on a 1989 Macintosh

MacMind is a 1,216-parameter transformer neural network implemented entirely in HyperTalk on a 1989 Macintosh SE/30, demonstrating that the fundamental math of modern AI is universal across hardware eras.

A complete transformer neural network implemented entirely in HyperTalk, trained on a Macintosh SE/30.

MacMind is a 1,216-parameter single-layer single-head transformer that learns the bit-reversal permutation -- the opening step of the Fast Fourier Transform -- from random examples. Every line of the neural network is written in HyperTalk, a scripting language from 1987 designed for making interactive card stacks, not matrix math. It has token embeddings, positional encoding, self-attention with scaled dot-product scores, cross-entropy loss, full backpropagation, and stochastic gradient descent. No compiled code. No external libraries. No black boxes.

Option-click any button and read the actual math.

The same fundamental process that trained MacMind -- forward pass, loss computation, backward pass, weight update, repeat -- is what trained every large language model that exists today. The difference is scale, not kind. MacMind has 1,216 parameters. GPT-4 has roughly a trillion. The math is identical.

We are at a moment where AI affects nearly everyone but almost nobody understands what it actually does. MacMind is a demonstration that the process is knowable -- that backpropagation and attention are not magic, they are math, and that math does not care whether it is running on a TPU cluster or a 68000 processor from 1987.

Everything is inspectable. Everything is modifiable. Change the learning rate, swap the training task, resize the model -- all from within HyperCard's script editor. This is the engine with the hood up.

The bit-reversal permutation reorders a sequence by reversing the binary representation of each position index. For an 8-element sequence:

Position: 0 1 2 3 4 5 6 7
Binary: 000 001 010 011 100 101 110 111
Reversed: 000 100 010 110 001 101 011 111
Maps to: 0 4 2 6 1 5 3 7

So input [3, 7, 1, 9, 5, 2, 8, 4] becomes [3, 5, 1, 8, 7, 2, 9, 4].

This permutation is the first step of the Fast Fourier Transform, one of the most important algorithms in computing. The model is never told the rule. It discovers the positional pattern purely through self-attention and gradient descent -- the same process, scaled up enormously, that taught larger models to understand language.

After training, the attention map on Card 4 reveals the butterfly routing pattern of the FFT. The model independently discovered the same mathematical structure that Cooley and Tukey published in 1965.

MacMind is a 5-card HyperCard stack:

| Card | Purpose | |---|---| | 1 -- Title | Project name and credits | | 2 -- Training | Train the model and watch it learn in real time | | 3 -- Inference | Test the trained model on any 8-digit input | | 4 -- Attention Map | Visualize the 8x8 attention weight matrix | | 5 -- About | Plain-text explanation of what the model is doing |

Each step generates a random 8-digit sequence, runs the full forward pass, computes cross-entropy loss, backpropagates gradients through every layer, and updates all 1,216 weights. Progress bars, per-position accuracy, and a training log update in real time.

MacMind was trained on a Macintosh SE/30 running System 7.6.1 and has also been tested through Basilisk II on Apple Silicon. HyperTalk is interpreted, and every multiply, every field access, every variable lookup goes through the interpreter. Each training step takes several seconds. Training to convergence (~1,000 steps) takes hours.

HyperCard 2.0 or later is required. HyperCard 1.x evaluates arithmetic left-to-right without standard precedence, which would silently corrupt every matrix multiplication and gradient computation in the model. HyperCard 2.0 introduced standard mathematical operator precedence. The stack was built and tested with HyperCard 2.1.

Source: Hacker News