Why Writing an Emulator is Fun and Why You Should Too

April 15, 2015 Steve Hawley

In this well-written article, Paul Ford discusses running emulators from old computing systems. For Paul, it was a step of the grieving process; the loss of his long-time friend, Tom. In his grieving process, Paul shares some great examples of emulating.  I’m going to talk about the process of writing an emulator and why you should.

The ‘why’ should be addressed first. If you were the kind of kid who spent untold hours studying something just because you thought it was interesting or spent days trying to find the perfect speed run through a game, this is right up your alley. You get to look at how a particular processor operates in minute detail and discover how a processor interfaces with hardware and figure out how to design a system to represent both of those accurately. There are opportunities for all kinds of interesting abstractions and architectural decisions from how concretely you would like to represent memory to whether or not you will run the emulator multithreaded. In the end, you will become a better programmer and a better debugger when you understand what is happening under the hood of a typical system.

The operation of a CPU is extremely simple in most processors manufactured until around 1990. After that, many processors started to get much more complicated by having branch prediction, instruction pipelining, multiple cores, and so on. If you’re new to the game, consider looking at a relatively simple 8 bit processor such as the Intel 8080 or Motorola 6800.

Where to start?  First, model the CPU, which consists of a number of registers. These registers  are essentially the memory that has no address to which the CPU can directly access. One way to do this is to make an object that represents the CPU with properties that are its internal registers. Next, you will likely want to consider how a CPU runs. At it’s essence, a CPU consists of three operations run in sequence: fetch, decode, execute. For a typical CPU, fetch will look something like this:

public byte Fetch() {
    return Memory.Read(PC++);

This simply returns the next byte read. In many cases, decode and execute can be combined in one step. For simple 8-bit CPUs, there is a maximum of 256 instructions (one for every byte value). For others, some instructions are escape codes that switch to another instruction set. Nonetheless, you can handle this by building a giant switch statement and call methods for each op code. This is cumbersome, hard to read and hard to maintain. I prefer to use a delegate or function pointer for each instruction and put that into a table indexed by the op code. I also suggest making that function return the number of clock cycles that was consumed by the instruction. A typical Decode/Execute might look like this:

private delegate int InstructionProc(byte opcode);
public int DecodeExecute(byte opcode) {
    InstructionProc proc = _procs[opcode];
    if (proc == null) return -1;
    return proc(opcode);

You might wonder why I return the cycle time instead of looking it up in a table. The issue is some instructions on certain CPUs will take a variable amount of time, depending on what the instruction did. For example, many CPUs take more time executing a branch instruction when the branch is taken,(rather than not). This is not known until the instruction is executed.

At this point the structure of your emulator loop should be evident:

public void RunInstruction() {
    int startTime = CurrentTime();
    byte opcode = Fetch();
    int cycles = DecodeExcute(opcode);
    int endTime = CurrentTime();
    WasteTime(cycles, endTime - startTime);

public void Run() {
    while (!Stopped) {

WasteTime() does exactly that. It will look at the amount of real time spent running the emulated instruction then it will look at how much time should have been spent by the instruction based on the number of cycles and the clock rate for this instance of a CPU. The question that you ask now is: how does anything get done? It entirely depends on the hardware. For most early CPUs there are either specialty instructions that cause I/O or there are special memory locations that cause things to happen when they are written to or read from or things that cause interrupts. How and what happens will depend entirely on the system you’re trying to emulate. If you’re running multithreaded, perhaps your hardware will generate events that will be received by the CPU object. Maybe your memory object will generate events when written to or read from and the hardware thread will receive them. Maybe you will hardwire the memory/hardware interactions.

At this point, you might consider writing a simple disassembler for your CPU. Again, this is something that can be table-driven. For any given instruction you could make a table of that includes the instruction name, the number of bytes it takes, and either a format string or a function that generates the disassembled string. Having a disassembler will make running your CPU much easier.

Around 1993, I wrote an emulator for the 6808 processor (a special version of the 6800 that included 128 bytes of ram on board) and the hardware to the sound board in Robotron, a system that ran at a clock speed of a little under 1 MHz. The emulator was written in C and when run on a Macintosh LC III (a 25 MHz 68030), ran just a little bit slower than real-time. This means that for every cycle in the 6808, I was doing slightly more than 25 cycles on the Macintosh. Looking at the code, there were some optimizations that could be made to the C, but to get better performance, I would have rewritten each of the instructions in assembly language. Most of my time was spent trying to emulate any given instruction’s condition codes. Since the 68000 had condition codes in exactly the same order as the 6800, this would be easy. I decided not to do this when I ran the code on a brand new PowerPC and it ran 4x real time with no optimizations. Today, managed languages such as Java or C# running on current computers are completely suited to the task of emulation of older systems. In addition, they have the luxury of being able to take advantage of more abstract, easier-to-maintain techniques, since a 2 GHz can easily emulate dozens of these CPUs simultaneously.

I often hear about engineers who are looking for projects to do to help improve their skills and their marketability. If you are just such an engineer, instead of asking, “why does the world need another M.A.M.E?” you should ask, “how much will my skill as an engineer improve by implementing an emulator?”

About the Author

Steve Hawley

Steve has been with Atalasoft since 2005. Not only is he responsible for the architecture and development of DotImage, he is one of the masterminds behind Bacon Day. Steve has over 20 years of experience with companies like Bell Communications Research, Adobe Systems, Newfire, Presto Technologies.

Follow on Twitter More Content by Steve Hawley
Previous Article
Functional Programming: It Is/Is Not a Silver Bullet
Functional Programming: It Is/Is Not a Silver Bullet

Functional Programming Is a Silver Bullet Having written a decent amount...

Next Article
NPAPI Has Left the Building: What WingScan Users Should Know
NPAPI Has Left the Building: What WingScan Users Should Know

Currently, our web-based scanning solution, Atalasoft WingScan, uses...

Try any of our Imaging SDKs free for 30 days with Full Support

Download Now