The gold standard of optimization: A look under the hood of RollerCoaster Tycoon

A deep dive into how Chris Sawyer used Assembly and clever programming tricks to simulate thousands of agents in RollerCoaster Tycoon on 1999 hardware.
Due to some lucky circumstances, I recently had the chance to appear in one of the biggest German gaming podcasts, Stay Forever, to talk about the technology of RollerCoaster Tycoon (1999). It was a great interview, and I strongly recommend to listen to the whole episode here, at least if you speak german. If not, don’t worry—this article covers what was said (and a little more).
RollerCoaster Tycoon and its sequel are often named as some of the best-optimized games out there, written almost completely in Assembly by their creator, Chris Sawyer. Somehow this game managed to simulate full theme parks with thousands of agents on the hardware of 1999 without breaking a sweat. An immensely impressive feat, considering that even nowadays a lot of similar building games struggle to hit a consistent framerate.
So how did Chris Sawyer manage to achieve this?
There are a lot of answers to this question, some of them small and focused, some broad and impactful. The one which is mentioned first in most articles is the fact that the game was written in the low-level language Assembly, which, especially at the time of the game’s development, allowed him to write more performant programs than if he had used other high-level languages like C or C++.
Coding in Assembly had been the standard for game development for a long time but at this point in time was basically a given-up practice. Even the first Doom, which was released six years earlier, was already mostly written in C with only a few parts written in Assembly, and nobody would argue that Doom was in any way an unoptimized game.
It’s hard to check for sure, but it’s likely that RCT was the last big game developed in this way. How big the performance impact was at the time is hard to quantify, but for what it’s worth, it was probably higher than it would be nowadays. Compilers have gotten much better at optimizing high-level code, and many optimizations that you’d need to do manually back then can be handled by compilers nowadays.
But besides the use of assembly, the code of RCT was aggressively optimized. How do we know this if the source code has never been released? We have something that’s almost as good: A 100% compatible re-implementation of it, OpenRCT2.
Written by (very) dedicated fans, OpenRCT2 manages to reimplement the entirety of RollerCoaster 1&2, using the original assets. Even though this is NOT the original source code, especially in its earlier versions, this re-implementation is a very, very close match to the original, being based on years of reverse engineering. Note that by now, OpenRCT2 contains more and more improvements over the original code. I’ll note some of those changes as we come across them.
Also, I won’t go through all optimizations, but I will pick some examples, just to illustrate that every part of the game was optimized to the brink.
Types of Money
How would you store a money value in a game? You would probably start by thinking about the highest possible money value you might need in the game and choose a data type based on that. Chris Sawyer apparently did the same thing, but in a more fine-grained way.
Different money values in the code use different data types, based on what the highest expected value at that point is. The variable that stores the overall park value, for example, uses 4 bytes since the overall park value is expected to use quite high numbers. But the adjustable price of a shop item? This requires a far lower number range, so the game uses only one byte to store it. Note that this is one of the optimizations that has been removed in OpenRCT2, which changed all occurrences to a simple 8-byte variable, since on modern CPUs it doesn’t make a performance difference anymore.
Replacing math operations with bitshifting
When reading through OpenRCT2’s source, there is a common syntax that you rarely see in modern code, lines like this:
NewValue = OldValue << 2;
Thanks to operator overloading, the ‘<<’ operator can mean a lot of things in C++. What the line effectively does is the same as what most coders would write like this:
NewValue = OldValue * 4;
What the ‘<<’ operator does here is called bit shifting, meaning all the bits that store the value of the variable are shifted to the left, in this case by two positions, with the new digits being filled in with zeros. Since the number is stored in a binary system, every shift to the left means the number is doubled.
At first this sounds like a strange technical obscurity, but when multiplying numbers in the decimal system we basically do the same. When you multiply 57 * 10, do you actually ‘calculate’ the multiplication? Or do you just append a 0 to the 57? It’s the same principle just with a different numerical system.
The same trick can also be used for the other direction to save a division:
NewValue = OldValue >> 3;
This is basically the same as
NewValue = OldValue / 8;
RCT does this trick all the time, and even in its OpenRCT2 version, this syntax hasn’t been changed, since compilers won’t do this optimization for you. This might seem like a missed opportunity but makes sense considering that this optimization will return different results for underflow and overflow cases (which the code should avoid anyway).
The even more interesting point about those calculations, however, is how often the code is able to do this. Obviously, bit shifting can only be done for multiplications and divisions involving a power of two, like 2, 4, 8, 16, etc. The fact that it is done that often indicates that the in-game formulas were specifically designed to stick to those numbers wherever possible, which in most modern development workflows is basically an impossibility. Imagine a programmer asking a game designer if they could change their formula to use an 8 instead of a 9.5 because it is a number that the CPU prefers to calculate with. There is a very good argument to be made that a game designer should never have to worry about the runtime performance characteristics of binary arithmetic in their life, that’s a fate reserved for programmers. Luckily, in the case of RCT the game designer and the programmer of the game are the same person, which also offers a good transition to the third big optimization:
Game Design for Performance
RCT was never a pure one-man-project, even though it is often described as one. All the graphics of the game and its add-ons, for example, were created by Simon Foster, while the sound was the responsibility of Allister Brimble.
But it’s probably correct to call it a Chris Sawyer Game, who was the main programmer and only game designer in unison.
This overlap in roles enables some profound optimizations, by not only designing the game based on the expected game experience, but also informed by the performance characteristics of those design decisions.
One great example for this is the pathfinding used in the game. When writing a game design document for a park building game, it’s very easy to design a solution in which guests first decide on which attraction they want to visit (based on the ride preferences of the individual guest), and then walk over to their chosen attraction.
From a tech point of view, this design, however, is basically a worst case scenario. Pathfinding is an expensive task, and running it for potentially thousands of agents at the same time is a daunting prospect, even on modern machines.
That’s probably why the guest behavior in RCT works fundamentally different. Instead of choosing a ride to visit and then finding a path to it, the guests in RCT walk around the park, basically blind, waiting to stumble over an interesting ride by accident. They follow the current path, not thinking about rides or needs at all. When reaching a junction, they will select a new walking direction almost randomly, only using a very small set of extra rules to avoid dead ends, etc.
This “shortcoming” is actually easy
Source: Hacker News










