Machine that never stops computing

Warm up is over. Let's get to the real PA.

If PA1 is just a warm-up for programming, then PA2 is the beginning of the main movie. In PA2, you will feel a tidal wave of new code and engineering details, you need to constantly read the handouts, RTFM, RTFSC, and organize the details, which is inefficient at first, but once you have mastered the details, it is just repetitive engineering work later.

Furthermore, if the "don't dwell on code that is not relevant to your current progress" reference in PA1 was meant to protect your young mind, in PA2 you'll need to be brave enough to explore: the handouts don't point out every bit of code you should read in the same way that PA1 did. In fact, all the code in PA is worth reading, and by the end of PA2, you'll need to be able to understand every detail of NEMU. As you get deeper into the details of the code, you'll feel more comfortable even when dealing with bugs. On the other hand, if you still have the mentality of "just pass the test", "it doesn't count towards my score, it's none of my business", you'll soon realize that you don't even know how to deal with bugs.

So, don't be lazy.

In PA1, we have seen how the simplest computer, TRM, works:

while (1) {
  Fetch the instruction from the memory location indicated by the PC;
  Execute the instruction.
  Updating the PC.
}

Let's talk about the process, i.e., how the CPU actually executes an instruction. For most instructions, execution can be abstracted into the fetch-decode-execute cycleopen in new window. To make the description clearer, we brorrow some concepts from the instruction cycle to illustrate the process of instruction execution.

instruction fetch, IF

To execute an instruction, you first need to get the instruction. Where is the instruction? Remember the core idea of the Von Neumann architecture? It's "programs and data are stored in a separate storage unit called memories and are treated the same". You didn't realize that when you heard it before, but now it's time to put it into practice. Von Neumann tells you that instructions are in memory, and the PC points to the location of the current instruction. In fact, the PC is a pointer! The concept of pointers is ubiquitous in the computer world. If you don't feel familiar with the concept of pointers, it's time to brush up on your pointer skills. What fetching an instruction does, naturally, is to read the instruction pointed to by the PC from memory into the CPU.

instruction decode, ID

At the instruction fetch stage, the computer gets the instruction to be executed. Let's take a look at the instruction itself. Surprise! it's a bit stream of 0s and 1s!

10111001 00110100 00010010 00000000 00000000

What the hell is this... But come to think of it, a computer is just a giant digital circuit, and it can only understand zeros and ones. But how does such a computer make sense of this confusing stream of bits?

Let's think about what instructions do for a moment. We know that the CPU is used to process data, and instructions are used to tell the CPU what to do with what data. In other words, the CPU knows what we want it to do if we let it decipher the target and operation from the mysterious stream of bits above. So accordingly, the CPU needs to decode the "opcode" (the operation) and "operand" (the target) from the instruction.

So, in order for the computer to understand the meaning of the instruction, the pioneers came up with a way, that is, you learned in digital circuits class, A lookup table! After the CPU gets an instruction, it can find out the operands and opcodes of the instruction by looking up the table. This process is called decoding.

Of course, the decoding logic is not as simple as just a lookup table, but also involves selecting different operands for different instructions via a multiplexer. Recall that computers now have memories and registers that can hold operands, and instructions that can hold immediate numbers. There may also be a secondary decoding process... But no matter how complicated it gets, all we need to know is that the process is ultimately just some digital circuitry, and after all, all the information we need is in the instructions, so there's nothing mysterious going on.

execute, EX

After decoding, the CPU knows exactly what the current instruction is going to do, and the execution phase is where the actual operation indicated by the instruction been performed. Currently the TRM has only one module for execution, the adder, so if necessary, you only need to input two source operands into the adder to get the result of the execution. The result is then written back to the destination operand, either a register or the memory.

Update PC

After executing an instruction, the CPU has to execute the next instruction. Before doing so, the CPU needs to update the value of PC to add the length of the instruction it just executed, then PC will point to the location of the next instruction.


Thus, the computer repeats the four steps above, executing instructions over and over again, forever.

YEMU: A simple cpu emulator

As an example, we will use the simple computer described in Section 1.1.3 of the ICS textbook to show how to implement an instruction's execution in C. This computer has four 8-bit registers, a 4-bit PC, and 16 bytes of memory. It supports both R-type and M-type instruction formats, with 4 instructions. The instruction manual is as follows.

                                                     4  2  0
            |                        |        | +----+--+--+
mov   rt,rs | R[rt] <- R[rs]         | R-type | |0000|rt|rs|
            |                        |        | +----+--+--+
            |                        |        | +----+--+--+
add   rt,rs | R[rt] <- R[rs] + R[rt] | R-type | |0001|rt|rs|
            |                        |        | +----+--+--+
            |                        |        | +----+--+--+
load  addr  | R[0] <- M[addr]        | M-type | |1110| addr|
            |                        |        | +----+--+--+
            |                        |        | +----+--+--+
store addr  | M[addr] <- R[0]        | M-type | |1111| addr|
            |                        |        | +----+--+--+

We can write the emulator YEMU for this simple computer in C according to the instruction manual

#include <stdint.h>
#include <stdio.h>

#define NREG 4
#define NMEM 16

// define the instruction format
typedef union {
  struct { uint8_t rs : 2, rt : 2, op : 4; } rtype;
  struct { uint8_t addr : 4      , op : 4; } mtype;
  uint8_t inst;
} inst_t;

#define DECODE_R(inst) uint8_t rt = (inst).rtype.rt, rs = (inst).rtype.rs
#define DECODE_M(inst) uint8_t addr = (inst).mtype.addr

uint8_t pc = 0;       // PC, There is no 4-bit data type in C, we use 8-bit type to represent it
uint8_t R[NREG] = {}; // registers
uint8_t M[NMEM] = {   // memory, which contains a program that calculates z = x + y
  0b11100110,  // load  6#     | R[0] <- M[y]
  0b00000100,  // mov   r1, r0 | R[1] <- R[0]
  0b11100101,  // load  5#     | R[0] <- M[x]
  0b00010001,  // add   r0, r1 | R[0] <- R[0] + R[1]
  0b11110111,  // store 7#     | M[z] <- R[0]
  0b00010000,  // x = 16
  0b00100001,  // y = 33
  0b00000000,  // z = 0
};

int halt = 0; // end marker

// Execute an instruction
void exec_once() {
  inst_t this;
  this.inst = M[pc]; // instruction fetch
  switch (this.rtype.op) {
  //  op decode    operand decode  execution
    case 0b0000: { DECODE_R(this); R[rt]   = R[rs];   break; }
    case 0b0001: { DECODE_R(this); R[rt]  += R[rs];   break; }
    case 0b1110: { DECODE_M(this); R[0]    = M[addr]; break; }
    case 0b1111: { DECODE_M(this); M[addr] = R[0];    break; }
    default:
      printf("Invalid instruction with opcode = %x, halting...\n", this.rtype.op);
      halt = 1;
      break;
  }
  pc ++; // Update PC
}

int main() {
  while (1) {
    exec_once();
    if (halt) break;
  }
  printf("The result of 16 + 33 is %d\n", M[7]);
  return 0;
}

Understanding how YEMU executes programs

YEMU can be thought of as a simplified version of NEMU, the principles are similar, so you need to understand how YEMU executes programs. Specifically, you need to do the following

  • Draw the state machine of an addition program executed on the YEMU
  • Understanding how the YEMU executes an instruction via RTFSC

Think about it, what is the connection between the two?

You may wonder what else can be done with this TRM, which can only do addition. For computers that use 2's complement representation, if you can add, you can subtract. If you add a conditional jump instruction jnz r, addr: when register r is not 0, the PC jumps to addr, the TRM will be very different. For example, loops can be implemented by combining jnz and dec, looping over inc can be used to add any number, looping over addition can be used to multiply, function calls can be seen as a special kind of jump, recursion is essentially a function call... This is great, we did not realize that this weak TRM has hidden the power to shake the earth! However, although the TRM can solve all computable problems with only a few instructions, it is unbearably inefficient. Therefore, Pioneer decided to add more efficient instructions to TRM.