Model-Based Testing for Dungeons & Dragons

From modeling a single character to building a complex combat system, this article explores how formal modeling languages help manage overlapping reaction chains in the D&D universe.
Model-Based Testing for Dungeons & Dragons
Last time, I modeled a single D&D character. One creature, standing alone in the void, tracking hit points, conditions, and spell slots. The Quint spec (a formal modeling language) for that was around 6,000 lines.
Now I added combat. Not just “a Fighter hits a Goblin with a sword” kind of combat, but the hard part: Counterspell chains, Shield interrupting mid-attack, readied spells holding concentration, Legendary Resistance flipping saves.
Thus, I did the hard part first. Without it, you can’t prove full viability. A spec that handles the simple cases proves nothing, and is akin to a “Todo App project”; the real blockers live in the interactions. If Counterspell chains can deadlock the state machine, or readied-spell concentration can orphan active effects, I needed to know now, not after building three more layers on top.
The battle spec is less than half the creature code. You’d think it amounts to less than half the complexity, but that is far from true.
A single creature’s state space is a flat reducer. A glorified character sheet. Level up, take damage, gain a condition, lose a condition. The state space is large but flat. Combat makes it profound. Two creatures interacting means every action can trigger reactions, every reaction can trigger counter-reactions, and the whole thing nests before anyone’s taken damage. This is where beginner and even proficient players often get confused.
Here’s what the spec covers now: all the character classes (12 of them,) 14 conditions that modify what you can and can’t do. Legendary creatures that act on other creatures’ turns. Counterspell that interrupts spellcasting. Opportunity attacks that fire mid-movement and trigger full attack resolution chains. Both player characters and monsters, with character lists and stat blocks.
The demo replays a scripted arcane battle step by step: Fireball, Counterspell chains, AoE damage, death saves. Step forward, step back, or hit play and watch the interrupt windows open and resolve.
Last time it was one creature in the void. Now it’s all classes, conditions, and a framework for adding any spell or feat whether from the SRD1 or published books. Finally, well past proof of concept!
The Interrupt Chain
A single attack is not a single event.
When one creature attacks another, the rules open a series of interrupt windows — moments where participants can react and alter the outcome mid-resolution. Each one can branch the game state.
Phase 1: The attack hits. The d20 lands, the DM compares it to Armor Class — hit or miss. But the target might cast Shield (+5 AC, potentially flipping the hit to a miss). A Bard’s Cutting Words subtracts a die from the roll. All of this happens after the roll but before the hit is confirmed.
Phase 2: Damage lands. The hit lands. Dice are rolled. But the Rogue has Uncanny Dodge — halve that damage as a reaction. A Monk’s Deflect Attacks subtracts 1d10 plus their Dexterity modifier and level. These fire after the hit but before the damage applies.
Phase 3: A spell is being cast. Spellcasting opens its own window: any creature with Counterspell prepared and a reaction available can force a Constitution save. If the caster fails, the spell fizzles. Use a high enough slot and it auto-succeeds. The Counterspell is itself a spell, so another creature can Counterspell the Counterspell. In the spec, this is a literal stack data structure — push, resolve, pop:
var bSpellStack: List[SpellStackEntry] // push on cast, pop on resolve
Phase 4: A save fails. The target fails their saving throw. But the Ancient Dragon has Legendary Resistance — it spends a charge and the save auto-succeeds. This fires after the save result but before the fail effects apply.
Now layer these on top of each other.
A Rogue moves away from an enemy creature. The enemy has an unused reaction, so it gets an Opportunity Attack — one melee attack (even if the attacker has multiattack property, mind me!). That attack enters Phase 1 (can the Rogue cast Shield?), then Phase 2 (Uncanny Dodge?), then after-damage effects (concentration check if the Rogue was concentrating on a spell). All of this happens mid-movement, before the Rogue reaches their destination.
An ally Wizard casts Hold Monster on the Dragon. Phase 3: Counterspell window. The Dragon’s minion tries to counter it. A second ally counter-Counterspells. The stack pops: inner Counterspell resolves first, then the outer one. The spell goes through. The Dragon fails its save. Phase 4: Legendary Resistance. The Dragon burns a charge. Save flipped to success.
After all of this resolves, where does the state machine go back to? It depends on what started it. An attack on your turn resumes the turn. An Opportunity Attack mid-movement resumes movement. An AoE spell moves to the next target. The damage resolution is identical in all cases — only the resume point differs. In the spec, each source carries its own return address as a tagged union:
type AfterDamageReturn =
| ADRActiveTurn
| ADRResolvingAoE(AoESpellCtx)
| ADRResolvingMovement(MovementCtx)
| ADRAwaitingLegendaryAction(LAWindowCtx)
| ADRAwaitingReadiedAction(ReadyWindowCtx)
Every time the state machine enters an interrupt chain, it carries the return address with it. When the chain resolves — all reactions offered, all damage applied, all concentration checks done — it pattern-matches on that tag and resumes the correct phase.
The XState machine mirrors the same type and the same pattern match:
// battle-machine-types.ts
type AfterDamageReturn =
| { tag: "ADRActiveTurn" }
| { tag: "ADRResolvingAoE"; aoe: AoESpellCtx }
| { tag: "ADRResolvingMovement"; mv: MovementCtx }
| { tag: "ADRAwaitingLegendaryAction"; la: LAWindowCtx }
| { tag: "ADRAwaitingReadiedAction"; ready: ReadyWindowCtx }
// battle-machine-helpers.ts
function returnToState(r: AfterDamageReturn): PhaseFields {
return Match.value(r).pipe( // effect/Match: exhaustive pattern matching
byTag("ADRActiveTurn", () => PHASE_ACTIVE),
byTag("ADRResolvingAoE", (v) => phaseResolvingAoE(v.aoe)),
byTag("ADRResolvingMovement", (v) => phaseResolvingMovement(v.mv)),
// ...
Match.exhaustive // no default branch — compiler rejects missing variants
)
}
Quint pattern-matches on the tag to decide where to go next. XState does the same via Match.exhaustive — which also means adding a new return variant is a compile error until every call site handles it. MBT compares context after every step: if the XState machine routes to the wrong phase, the trace diverges and the seed is logged.
This is why the formal spec matters. The interrupt interactions create a combinatorial state space. The Quint fuzzer explores it; unit tests cover the paths you thought of.
The spec models all of this. But it doesn’t model everything.
What’s Spec vs What’s DM
The spec doesn’t model everything. It models everything mechanically deterministic — given the same inputs, there’s exactly one correct answer. Everything else enters the spec as caller-provided input.
Dice rolls are the simplest case. The spec never generates random numbers. The caller passes rolls as arguments, already resolved. The spec proves that given any roll, the downstream mechanics are correct. Randomness is the caller’s problem.
Cover works the same way. The spec receives it as a typed enum: NoCover | HalfCover | ThreeQuartersCover | TotalCover. It proves the AC and DEX save bonuses for each level. Whether that pillar constitutes half cover or three-quarters cover is DM’s call on geometry they use.
Opportunity attacks push this further. When a creature moves through a threatened area, the spec receives a set of threatening creatures as input, then tests: given any set, does the OA pipeline resolve correctly? It never asks “who’s actually within 5 feet?”
Source: Hacker News













