Breaking Enigma with Index of Coincidence on a Commodore 64

Breaking Enigma without known plaintext using statistical analysis on the Commodore 64 via the Index of Coincidence method.

Breaking Enigma without known plaintext using statistical analysis on the Commodore 64.

Breaking Enigma with Index of Coincidence on a Commodore 64

By Michael Doornbos

30 minutes read - 6358 words

Everything we’ve done so far requires a crib. The brute-force cracker tested candidate settings against known plaintext. Turing’s Bombe used contradictions between known plaintext and ciphertext to eliminate wrong settings. No crib, no attack.

But what if you intercepted a message and had no idea what it said? No weather report header, no formulaic greeting, no re-sent traffic. Just ciphertext.

William Friedman developed a tool for exactly this problem in 1922. It doesn’t need known plaintext. It doesn’t even need to know the language. It measures whether a piece of text looks like language at all or whether it looks like random noise. He called it the index of coincidence.

What the Index of Coincidence Measures

Pick two random letters from a piece of English text. What’s the probability they’re the same letter? Not very high, but higher than you’d think, because letter frequencies aren’t uniform. E appears about 13% of the time. Z appears less than 0.1%. Two randomly chosen letters are more likely to both be E than to both be Z.

For English text, the probability of a match is about 0.0667. For German it’s about the same (0.0762 by some measurements, but the exact number depends on the text). For truly random text where all 26 letters are equally likely, it’s 1/26 = 0.0385.

That difference is the entire attack. Decrypt a ciphertext with the wrong Enigma settings and the output looks random: IC near 0.038. Decrypt with the right settings and the output is German: IC near 0.067. No crib needed. The statistical properties of the language itself are the signal.

where $n_i$ is how many times letter $i$ appears. The numerator counts how many ways you could pick two copies of the same letter. The denominator counts how many ways you could pick any two letters. The ratio is the probability that two randomly chosen letters match.

For a C64 implementation, we don’t need the division. We just compute the numerator (call it the IC sum) and compare it to a threshold. If IC should be above 0.050, that means the sum should be above $0.050 \times N \times (N-1)$. For a 60-character message, that’s $0.050 \times 60 \times 59 = 177$. A 16-bit integer comparison. No floating point needed. The C64’s BASIC ROM has a full floating-point engine, but we don’t have to touch it.

Why the Plugboard Doesn’t Matter

This is the best part. The plugboard is a monoalphabetic substitution: it swaps pairs of letters before and after the rotors. If R and M are swapped, every R in the output becomes M and every M becomes R. But the total count of each pair stays the same. The letter that appeared 8 times still appears 8 times, it just has a different name.

The IC doesn’t care about which letters are frequent. It only cares about how frequent the most frequent letters are. A text where E appears 13% of the time has the same IC as a text where Q appears 13% of the time. The plugboard rearranges the labels on the frequency bins without changing the bin sizes.

This means IC attacks only the rotor settings (5.9 million candidates, or 103.9 billion with ring settings). Once you’ve found the right rotors and positions, you can solve the plugboard separately using frequency analysis. A Bombe solves both at once. IC splits them into two smaller problems.

The Scenario

In the brute-force cracker, we intercepted a U-boat weather report from the Bay of Biscay and used “WETTER” as a crib to find the settings. This time, we have a longer intercept from the same transmission, but we don’t know it’s a weather report. We don’t know anything about the plaintext. All we have is 60 characters of ciphertext:

The correct settings are still rotors III-I-V at positions M-C-Q. We need to find them using nothing but the statistical fingerprint of the German language.

How Well Does It Work?

Before writing code, let’s check the numbers. Decrypt the ciphertext with all 17,576 positions for the correct rotor ordering (III-I-V) and compute the IC of each result.

The correct decryption (M-C-Q) has an IC of 0.0729 and an IC sum of 258. The text reads WETTERVORHERSAGEBISKAYAHEUTEREGENMITWINDSTAERKEFUENFAUSOSTEN. “Weather forecast Biscay, today rain with wind strength five from east.” A routine U-boat weather report.

At different thresholds, here’s how many of the 17,576 positions survive for a single rotor ordering (III-I-V):

Threshold

IC sum

Candidates

Percent

0.040

141

6,233

35.5%

0.045

159

1,540

8.8%

0.050

177

286

1.6%

0.055

194

0.3%

0.060

212

0.04%

The BASIC version searches one ordering at a time, so a threshold of 177 (IC >= 0.050) is reasonable at 286 candidates per ordering. The assembly version searches all 336 orderings, so it uses a tighter threshold of 194 (IC >= 0.055) to keep the output manageable. At 194, roughly 49 candidates survive per ordering, about 18,165 total across all 336.

Even with that filtering, you can’t just sort by IC and pick the winner. The correct answer ranks #42 out of those 18,165 candidates. The top 10 are all gibberish with repeated letters that inflate the score.

IC Sum

Rotors

Position

First 40 characters

288

I-II-VII

K-Q-Q

VSIGDAWELGIIOEIFLHNOAFIIDPIKISGIAIGQWAGFS

284

VI-VII-I

V-Y-J

NJYOJQAWZJUJJZKJAZMFZZQVJRJDAJPGOABJFIVQU

280

VIII-I-V

Z-O-X

QBXRKCVQBXGRLAGLBDDUBVFBDSKXVPQYYMWBBBBBB

278

VI-II-VIII

M-U-N

VMHHHMHPMHAEHCIBHHLPHLZHHIAQSUZDTUICHQISY

278

II-V-VII

L-B-M

DJTKKHTUUXHHIOOJTOOBYHTJTTMMAMOJYEWJAATFM

…

258

III-I-V

M-C-Q

WETTERVORHERSAGEBISKAYAHEUTEREGENMITWIND

With only 60 characters, random letter distributions can outscore real language. The correct answer stands out because it’s the only one that reads like German, not because it has the highest score. You find it by reading the printer output, not by sorting a spreadsheet.

Longer messages stabilize the IC. With 200+ characters, the correct decryption reliably scores highest. With 60, you need a threshold that keeps a manageable number of candidates for human review.

Pseudocode

for each rotor ordering (336 permutations): for each starting position (17,576 per ordering): decrypt the entire ciphertext with this setting count letter frequencies (26 bins) compute IC sum = sum of n_i * (n_i - 1) if IC sum >= threshold: CANDIDATE FOUND, print settings and decryption

The per-candidate cost is higher than the brute-force cracker. That approach tested the first character of the crib and bailed immediately on mismatch, averaging about one encryption per candidate. IC requires encrypting the entire message: 60 encryptions for our 60-character ciphertext, plus the frequency counting and sum computation.

But there’s no crib to find. No interrogating captured submariners, no stealing codebooks. Just raw ciphertext and math.

Across all 5.9 million candidates (336 orderings), a threshold of 0.050 produces about 94,887 hits (1.6%). The correct answer is among them, but so are thousands of gibberish decryptions that happen to have uneven letter distributions. A human scanning the output would spot the readable German, but an automated system would need additional filtering (bigram frequencies, dictionary checks, or a higher threshold that risks missing the answer).

BASIC Version

The BASIC version searches all 17,576 positions for a single rotor ordering. The rotor wiring, stepping, and encryption subroutines are identical to the cracker. The new code is the IC computation.

5PRINTCHR$(5)10PRINTCHR$(147);" ENIGMA IC ATTACK"20PRINT" INDEX OF COINCIDENCE":PRINT100DIMRF(7,25),RI(7,25),RE(25)110FORR=0TO7:FORI=0TO25120READRF(R,I):NEXTI,R200DATA4,10,12,5,11,6,3,16,21,25210DATA13,19,14,22,24,7,23,20,18,15220DATA0,8,1,17,2,9230DATA0,9,3,10,18,8,17,20,23,1240DATA11,7,22,19,12,2,16,6,25,13250DATA15,24,5,21,14,4260DATA1,3,5,7,9,11,2,15,17,19270DATA23,21,25,13,24,4,8,22,6,0280DATA10,12,20,18,16,14290DATA4,18,14,21,15,25,9,0,24,16300DATA20,8,17,7,23,11,13,5,19,6310DATA10,3,2,12,22,1320DATA21,25,1,17,6,8,19,24,20,15330DATA18,3,13,7,11,23,0,22,12,9340DATA16,14,5,4,2,10350DATA9,15,6,21,14,20,12,5,24,16360DATA1,4,13,7,25,17,3,10,0,18370DATA23,11,8,2,19,22380DATA13,25,9,7,6,17,2,23,12,24390DATA18,22,1,14,20,5,0,8,21,11395DATA15,4,10,16,3,19400DATA5,10,16,7,19,11,23,14,2,1410DATA9,18,15,3,25,17,0,12,4,22420DATA13,8,20,24,6,21450REM COMPUTE INVERSE TABLES460FORR=0TO7:FORI=0TO25470RI(R,RF(R,I))=I480NEXTI,R500REM REFLECTOR UKW-B510FORI=0TO25:READRE(I):NEXT520DATA24,17,20,7,16,18,11,3,15,23530DATA13,6,14,10,12,8,4,1,5,25540DATA2,22,21,9,0,19600REM NOTCH POSITIONS (TWO PER ROTOR)610DIMNO(7,1)620NO(0,0)=16:NO(0,1)=-1630NO(1,0)=4:NO(1,1)=-1640NO(2,0)=21:NO(2,1)=-1650NO(3,0)=9:NO(3,1)=-1660NO(4,0)=25:NO(4,1)=-1670NO(5,0)=25:NO(5,1)=12680NO(6,0)=25:NO(6,1)=12690NO(7,0)=25:NO(7,1)=12700REM CIPHERTEXT (60 CHARS, 0-25)710DIMCT(59)720CT$="YDMAOIGMPQZPFVRCIGIIKJV"725CT$=CT$+"ECBDNPDITBYRYNKOCNJHII"727CT$=CT$+"VWXYUJBCDYGKVHW"730FORI=0TO59740CT(I)=ASC(MID$(CT$,I+1,1))-65750NEXT760NL=60:TH=177770DIMFR(25)800PRINT"SELECT 3 ROTORS (1-8)"810INPUT"LEFT ROTOR";RL820INPUT"MIDDLE ROTOR";RM830INPUT"RIGHT ROTOR";RR840RL=RL-1:RM=RM-1:RR=RR-1850PRINT:PRINT"SEARCHING 17,576 POSITIONS"860PRINT"FOR ROTORS";RL+1;"-";RM+1;"-";RR+1865PRINT"THRESHOLD:";TH870PRINT:TI$="000000":SC=0900FORL0=0TO25910PRINTCHR$(L0+65);" ";920FORM0=0TO25:FORR0=0TO25930GOSUB5000940IFF=1THENGOSUB7000950NEXTR0,M0,L0960PRINT:PRINT970PRINTSC;"CANDIDATES FOUND"975PRINT"TIME:";INT(TI/60);"SECONDS"980END2000REM STEP ROTORS2010IFMP<>NO(RM,0)ANDMP<>NO(RM,1)THEN20402020LP=LP+1:LP=LP-INT(LP/26)*262030MP=MP+1:MP=MP-INT(MP/26)*262040IFRP<>NO(RR,0)ANDRP<>NO(RR,1)THEN20602050MP=MP+1:MP=MP-INT(MP/26)*262060RP=RP+1:RP=RP-INT(RP/26)*262070RETURN3000REM ENCRYPT (FORWARD)3010E=(C+RP):E=E-INT(E/26)*263020C=RF(RR,E)3030C=(C-RP+26):C=C-INT(C/26)*263040E=(C+MP):E=E-INT(E/26)*263050C=RF(RM,E)3060C=(C-MP+26):C=C-INT(C/26)*263070E=(C+LP):E=E-INT(E/26)*263080C=RF(RL,E)3090C=(C-LP+26):C=C-INT(C/26)*263100REM REFLECTOR3110C=RE(C)3120REM ENCRYPT (INVERSE)3130E=(C+LP):E=E-INT(E/26)*263140C=RI(RL,E)3150C=(C-LP+26):C=C-INT(C/26)*263160E=(C+MP):E=E-INT(E/26)*263170C=RI(RM,E)3180C=(C-MP+26):C=C-INT(C/26)*263190E=(C+RP):E=E-INT(E/26)*263200C=RI(RR,E)3210C=(C-RP+26):C=C-INT(C/26)263220RETURN5000REM IC TEST5010LP=L0:MP=M0:RP=R05020REM CLEAR FREQUENCY TABLE5030FORI=0TO25:FR(I)=0:NEXT5040REM DECRYPT AND COUNT5050FORI=0TONL-15060C=CT(I):GOSUB2000:GOSUB30005070FR(C)=FR(C)+15080NEXT5090REM COMPUTE IC SUM5100S=05110FORI=0TO255120S=S+FR(I)(FR(I)-1)5130NEXT5140IFS>=THTHENF=1:RETURN5150F=0:RETURN7000REM SHOW CANDIDATE7010SC=SC+17020PRINT:PRINT"CANDIDATE";SC7030PRINT"POSITION: ";CHR$(L0+65);"-";7035PRINTCHR$(M0+65);"-";CHR$(R0+65)7040PRINT"IC SUM:";S7050LP=L0:MP=M0:RP=R07060FORI=0TONL-17070C=CT(I):GOSUB2000:GOSUB30007080PRINTCHR$(C+65);7090NEXT7100PRINT:RETURN

The IC test at line 5000 is where the action is. It decrypts all 60 characters, counts the frequency of each letter in the FR array, then computes the IC sum: $\sum n_i \times (n_i - 1)$. If the sum meets the threshold (177, corresponding to IC >= 0.050), it’s a candidate.

Line 7000 prints each candidate with its position, IC sum, and decrypted text so you can scan for readable German.

The threshold of 177 comes from $0.050 \times 60 \times 59 = 177$. Lower it and you get more candidates (and more false positives). Raise it and you might miss the correct answer if the IC is noisy. For a 60-character message, 177 is a reasonable middle ground.

To check that it’s working, enter rotors 3, 1, 5 (the correct ordering from the previous articles). The program searches all 17,576 positions and prints each candidate that passes the threshold. Most of the output is gibberish with IC sums in the 180-200 range. But candidate 146 jumps out:

CANDIDATE 146 POSITION: M-C-Q IC SUM: 258 WETTERVORHERSAGEBISKAYAHEUTEREGENMITWINDSTAERKEFUENFAUSOSTEN

That’s readable German. IC sum 258, well above the threshold of 177. Out of 286 total candidates for this rotor ordering, this is the only one that’s actual language. The rest are random letter salad that happened to have uneven frequency distributions.

Try entering a wrong ordering like 1, 2, 3. You’ll still get candidates (any ordering produces a few statistical flukes), but none of them will be readable. That’s the difference: wrong settings produce noise that occasionally looks uneven, while the right settings produce text that looks like German because it is German.

Assembly Version

The assembly version does the full search: all 336 rotor orderings, all 17,576 positions each. The structure is identical to the cracker, with the crib test replaced by the IC computation.

Here’s what the C64 has to chew through:

Count

Rotor orderings (P(8,3))

336

Positions per ordering (26Â³)

17,576

Total candidates

5,905,536

Encryptions per candidate

Total encryptions

354,332,160

Rotor passes per encryption

Total rotor passes

2,125,992,960

Frequency table updates

354,332,160

IC sum computations (26 bins each)

5,905,536

Over two billion rotor passes on a 1 MHz processor. The C64’s theoretical peak is about 0.5 MIPS (all 2-cycle instructions, which never happens in practice). Real-world throughput with mixed instructions is closer to 0.3 MIPS. Over 82 hours, the 6510 executes roughly 87 billion instructions and burns through 303 billion clock cycles. For more on what the 6510 can and can’t do at 1 MHz, see How Fast Can a 6502 Transfer Memory?.

The brute-force cracker averaged about one encryption per candidate thanks to early exit. The IC attack does 60 every time, no shortcuts.

The IC Test

The IC test decrypts the entire message, counts letter frequencies in a 26-byte table, and computes the IC sum as a 16-bit value:

; ic test ; decrypts ciphertext, computes ic sum ; returns z flag set if sum >= threshold testic ; clear frequency table ldx #25 lda #0 ticlr sta freq,x dex bpl ticlr ; set starting positions lda trylp sta leftpos lda trymp sta midpos lda tryrp sta rightpos ; decrypt and count ldx #0 tidec lda cipher,x stx savex jsr encrypt ; increment freq[a] tax inc freq,x ldx savex inx cpx #60 bne tidec ; compute ic sum (16-bit) ; sum = sigma(freq[i](freq[i]-1)) lda #0 sta iclo sta ichi ldx #0 tisum lda freq,x beq tiskip ; a = n = freq[i] ; compute n(n-1) ; freq values max ~10 for 60 chars ; so product fits in one byte tay dey sty temp iny lda #0 clc timul adc temp dey bne timul ; add to 16-bit sum clc adc iclo sta iclo bcc tiskip inc ichi tiskip inx cpx #26 bne tisum ; compare sum to threshold (194) ; 16-bit compare: ichi:iclo >= 0:194 lda ichi bne tipass lda iclo cmp #194 bcs tipass lda #1 rts tipass lda #0 rts

The frequency counting is simple: decrypt each character with jsr encrypt, use the result as an index into freq, and increment. After all 60 characters, loop through the 26 frequency bins computing $n \times (n-1)$ for each. The maximum possible IC sum (if all 60 characters were the same letter) would be $60 \times 59 = 3540$, which fits in 16 bits. We accumulate the total in iclo/ichi.

The multiplication uses repeated addition. With frequency values rarely above 10 for a 60-character message, this loop runs at most 10 iterations per bin. For bins with count 0 or 1, the product is zero and we skip the multiply entirely.

The threshold compare is a 16-bit check. If the high byte is nonzero, the sum is at least 256, which is above 194, so it passes immediately. Otherwise compare the low byte to 194.

The assembly version uses a tighter threshold (194, IC >= 0.055) than the BASIC version (177, IC >= 0.050). The BASIC version searches one rotor ordering at a time, so 286 candidates is manageable. The assembly version searches all 336 orderings, and at 177 that would produce roughly 95,000 candidates. At 194, it drops to about 16,500. The correct answer scores 258, so it passes either threshold easily.

The Search Loop

The position loop calls testic instead of the loop test and crib verification. There’s no two-phase filtering here, just one test:

One difference from the cracker: we don’t stop at the first match. The IC test produces multiple candidates, so we print each one and keep searching. The operator reads the output and picks the one that looks like German.

In practice, you’d want a printer connected. The search runs for hours and produces thousands of candidates. They scroll off a 25-line screen long before the search finishes. With a printer, you get a paper log to read through when it’s done. open 4,4:cmd 4 before running redirects output to a Commodore printer. print#4:close 4 when finished.

Showing Candidates

When a candidate passes the IC test, we print the rotor ordering, position, IC sum, and the first 40 characters of the decrypted text. The operator scans for readable German among the gibberish:

Here’s the complete IC attack. It assembles at $c000 and runs with sys 49152. The rotor wiring, stepping, encryption, and search structure are identical to the cracker. The only new code is testic and showcand. The border color changes with each new rotor ordering, so if it’s still cycling, it’s still searching.

Same advice about the printer applies here. The assembly version searches all 336 orderings and produces thousands of candidates over several hours. Hook up a printer before running, or you’ll only see whatever’s still on screen when it finishes:

The IC attack is slower per candidate than the brute-force cracker. Each candidate requires decrypting all 60 characters (60 encrypt calls) plus the frequency counting and IC sum computation. The cracker averaged about 1.04 encryptions per candidate thanks to early exit on the first crib character.

On a stock NTSC C64, the full IC search through all 5.9 million candidates takes 82 hours (about 3.4 days), producing 18,165 candidates at the 194 threshold.

The Jiffy Clock Problem

The C64’s jiffy clock at $a0-$a2 is 24 bits, which can count up to 16,777,216 jiffies (about 77.7 hours at 60 Hz). But the KERNAL never lets it get that high. The UDTIM routine at $f69b, called 60 times per second by the IRQ handler, resets the clock to zero when it reaches 5,184,000 jiffies (24 hours). It’s a time-of-day clock, not a free-running counter. From the original Commodore source:

; here we check for roll-over 23:59:59 ; and reset the clock to zero if true ud30 sec lda time+2 sbc #$01 lda time+1 sbc #$1a lda time sbc #$4f bcc ud60 ; time has rolled--zero register stx time stx time+1 stx time+2

An 82-hour search hits this reset three times. If you just read the clock at the end, you get the time since the last midnight reset, not the total elapsed time. I learned this the hard way after a four-day run reported ten hours. The fix: at each rotor ordering (every ~15 minutes), read the jiffy clock, add it to a 32-bit accumulator, and zero the clock. The KERNAL reset never fires because the clock never reaches 24 hours.

You don’t have to wait for the full search, though. The correct answer (III-I-V at M-C-Q) is ordering #87 out of 336. It shows up on the printer about 21 hours in, less than a day. The remaining 61 hours just confirm there’s nothing better hiding in the other orderings.

Where the Cycles Go

The rotorpass routine is where the CPU spends most of its time. It runs 2.1 billion times across the full search. Here’s every instruction with its cycle count:

rotorpass stx temp ; 3 save position offset clc ; 2 adc temp ; 3 add position to letter jsr mod26 ; 17 reduce mod 26 tay ; 2 use as table index lda (ptr),y ; 5 look up rotor wiring sec ; 2 sbc temp ; 3 subtract position clc ; 2 adc #26 ; 2 keep positive jsr mod26 ; 17 reduce mod 26 rts ; 6 ; total: 64 cycles

Each jsr mod26 costs 6 cycles for the call, then a compare, a conditional subtract, and a return. About 17 cycles total depending on whether the value needs reducing.

Each encrypt call does 6 of these (3 forward through the rotors, 3 inverse), plus table pointer setup (setfwd/setinv at 26 cycles each), stepping, and the reflector lookup. One full encrypt comes to about 720 cycles.

The inner loop does that 60 times per candidate (once per ciphertext character), then counts frequencies and computes the IC sum. Across 5.9 million candidates on an NTSC C64 at 1,022,727 Hz, the whole search takes 82 hours.

The tradeoff: the IC attack trades speed for generality. It’s slower than a crib-based attack, but it doesn’t need known plaintext. When you have a crib, use it. When you don’t, IC is your best option.

IC vs. Crib Attack

Crib Attack

IC Attack

Needs known plaintext?

Yes

Per-candidate cost

~1 encryption (early exit)

60 encryptions + frequency counting

Plugboard

Ignored (solved separately)

Ignored (IC is plugboard-invariant)

False positives

None (exact match)

18,165 candidates at threshold 194

Output

Single correct answer

Multiple candidates for human review

Time (assembly, all 336 orderings)

~22 minutes

~82 hours (~3.4 days)

The IC attack was never used operationally against Enigma during the war. Bletchley Park had enough cribs from weather reports, re-sends, and formulaic messages that the Bombe was the practical choice. IC analysis was well understood (Friedman published it in 1922), but the computational cost of decrypting entire messages for every candidate setting made it impractical with 1940s technology.

On a C64, the cost is steep but possible. With a crib, the brute-force cracker solved the same message in about 22 minutes. Without one, the IC attack takes three and a half days. That’s the price of not knowing anything about the plaintext. But the answer lands on the printer in about 21 hours. Just plug in a printer, type sys 49152, and go live your life for a while.

The Same Search on Modern Hardware

The IC attack is embarrassingly parallel. Every candidate is independent. No shared state, no ordering dependencies. A modern machine can throw cores and GPU threads at it without coordination.

I ported the same algorithm to C and ran it three ways on an Apple M4:

Version

Time

Candidates/sec

Speedup vs C64

C64 (6502 assembly, 1 MHz)

82 hours

~20

C, single-threaded (-O3)

1.856s

3.2 million

159,000x

C, OpenMP (all CPU cores)

0.409s

14.4 million

722,000x

Metal GPU (M4)

0.039s

151 million

7,569,000x

All four produce the same 18,165 candidates. Same algorithm, same rotor tables, same threshold. The GPU version finishes before your finger leaves the Enter key.

OpenMP splits the 336 orderings across all CPU cores. Nearly linear scaling. The GPU goes further, running all 5.9 million candidates as parallel threads. Each GPU core handles one candidate start to finish. The rotor tables and ciphertext fit in shared memory, so there’s no memory bottleneck.

To put the C64’s 82 hours in perspective, here’s what the same search would take on machines available in 1983, alongside some modern hardware. These estimates scale linearly by MIPS, which is rough but directionally right for this kind of integer-heavy table-lookup workload.

Machine

Year

MIPS

Estimated time

Commodore 64

1982

0.3

82 hours (measured)

IBM PC XT (8088, 4.77 MHz)

1983

0.33

74 hours

VAX-11/780

1977

1.0

25 hours

IBM 3081K mainframe

1982

93 minutes

Cray-1 (scalar)

1976

18 minutes

Cray X-MP (scalar, 1 CPU)

1982

105

14 minutes

AMD Ryzen 9 9950X (1 core)

2024

~28,000

~3.2 seconds

Apple M4 (1 core)

2024

~24,000

1.856 seconds (measured)

Apple M4 (all cores, OpenMP)

2024

0.409 seconds (measured)

Apple M4 GPU (Metal)

2024

0.039 seconds (measured)

The IBM PC XT is the fun one. The 8088 runs at nearly 5x the C64’s clock speed but delivers about the same integer throughput. Its 8-bit external bus and microcode overhead eat up the clock advantage. The VAX-11/780, the machine that literally defined “1 MIPS,” would have finished in about a day. The Cray-1 could have done it during a coffee break, but using a 10-million-dollar vector supercomputer for integer table lookups would have been a hard budget request to justify. Then again, if you’re trying to win a war, that budget gets approved.

The M4’s actual measured time (1.856 seconds single-threaded) is faster than the MIPS estimate because MIPS doesn’t capture what modern hardware really does. The C compiler eliminates subroutine call overhead, the branch predictor nails the inner loop, and every rotor table lookup hits L1 cache. The 6510 spends 64 cycles per rotor pass with two subroutine calls. The M4 burns through the same logic in a few nanoseconds.

The same math that took 82 hours on a C64 runs in 39 milliseconds on a GPU you can buy at the Apple Store.

Extra Credit

A few ways to take this further:

Pre-filter with IC: Use IC as a first pass to narrow down rotor orderings. For each ordering, compute IC for a sample of positions. If no position scores well, skip the ordering entirely. This reduces the search space for any follow-up attack.

Bigram scoring: IC uses single-letter frequencies. Bigram frequencies (pairs like TH, ER, EN in German) are an even stronger signal. Count all 676 bigram frequencies and compare to expected German bigrams. Higher accuracy, but the frequency table grows from 26 bytes to 676 bytes.

Solve the plugboard: Once IC identifies the correct rotor settings, the plugboard can be deduced using frequency analysis. Compare the letter frequencies of the rotor-only decryption to expected German frequencies. The mapping from observed frequencies to expected frequencies reveals the plugboard swaps.

Variable threshold: Use a sliding threshold based on message length. Shorter messages need a lower threshold (more false positives) because the IC is noisier. Longer messages allow a higher threshold (fewer false positives). Pre-compute the threshold for the specific message length.

All the code from this article (6502 assembly, C, OpenMP, Metal GPU, and Python verification) is on GitHub: mrdoornbos/enigma_ic.

Going Faster on the C64

The emulator article pointed out that mod26 could be inlined to save cycles. In the IC attack, rotorpass calls jsr mod26 twice per pass. Each jsr/rts pair costs 12 cycles. Over 2.1 billion rotor passes, that’s about 50 billion wasted cycles just bouncing in and out of a 3-instruction subroutine.

The fix is simple. Replace the two jsr mod26 calls with the comparison inline:

Same logic, no subroutine calls. The mod26 routine still exists for the stepping code (which runs once per character, not worth optimizing), but the rotor pass is where the CPU lives for 82 hours.

The result: 67 hours, down from 82. An 18% speedup from removing two jsr instructions. The code is 6 bytes larger. On a machine that runs for days, inlining a hot subroutine is the easiest optimization you can make.

Source: Hacker News