D4 Implementing a mini RISC-V Processor using RTL

With AM, we can consider using RTL to implement a more powerful processor and run more programs on it. However, as analysed in F Stage, sISA is too simple to support running more programs. Therefore, we first consider 'upgrading' the NPC implemented in RTL in E Stage to a minirv processor. With the completeness of minirv, we can run more programs on the NPC.

Modular RTL Design

Unlike sCPU, we will continue to improve NPC in the future and add more features to it. Therefore, it is necessary to maintain the NPC project and prepare for future improvements. One way to maintain code is through modularisation.

Review of minirv's ISA specification:

  • PC initial value is 0
  • Number of GPRs is consistent with the number of GPRs defined in RV32E
  • Supports the following 8 instructions: add, addi, lui, lw, lbu, sw, sb, jalr
  • Other ISA details are the same as RV32I

From the perspective of instruction types, minirv instructions cover functions such as addition, bit concatenation, memory access, and jumps. Based on these functions, we can divide NPC into modules according to the processor's workflow:

  • IFU(Instruction Fetch Unit): Responsible for fetching an instruction from memory based on the current PC.
  • IDU(Instruction Decode Unit): Responsible for decoding the current instruction, preparing the data and control signals needed for the execution stage.
  • EXU(EXecution Unit): Responsible for controlling the ALU based on control signals and performing calculations on data
  • LSU(Load-Store Unit): Responsible for controlling the memory based on control signals, reading data from memory, or writing data to memory
  • WBU(WriteBack Unit): Writes data to registers and updates the PC

You need to sort out the interfaces between modules yourself. Of course, you can also decide for yourself which components to place in which module. One exception is memory. To facilitate testing, we implement memory in C++ instead of RTL. For now, let's consider the simplest implementation method: pull the memory access interface signals to the top layer and access the memory using C++ code.

while (???) {
  ...
  top->inst = pmem_read(top->pc);
  top->eval();
  ...
}

You can easily implement a simple memory using C++ code.

Minirv Processor with Only Two Instructions

Next, we will implement the simplest instruction: addi. You have already implemented the minirv processor in Logisim in Phase F, so you should already have a relatively complete processor architecture diagram, or be able to imagine how instructions are executed in the above modules. With the architecture diagram or instruction execution flow, it is easy to describe the relevant modules using RTL.

Implement the addi instruction in NPC

Specifically, you need to consider the following:

  • Place the binary encoding of several addi instructions in memory (you can utilize the behavior of register 0 to write behavior-determined instructions).
  • Since jump instructions are not implemented yet, NPC can only execute sequentially. You can stop the simulation after NPC executes several instructions.
  • You can check whether the addi instruction is executed correctly by viewing the waveform or printing the state of general-purpose registers in the RTL code.
  • Regarding general-purpose registers, their circuitry is essentially a memory. Furthermore, to avoid students who choose Verilog from writing less reasonable behavioral modeling code, we provide the following incomplete code for everyone to complete (you don't need to change the contents of the always block):
module RegisterFile #(ADDR_WIDTH = 1, DATA_WIDTH = 1) (
  input clk,
  input [DATA_WIDTH-1:0] wdata,
  input [ADDR_WIDTH-1:0] waddr,
  input wen
);
  reg [DATA_WIDTH-1:0] rf [2**ADDR_WIDTH-1:0];
  always @(posedge clk) begin
    if (wen) rf[waddr] <= wdata;
  end
endmodule
  • You need to think about how to implement the feature of register 0.
  • Using NVBoard requires better support for devices in RTL code, we will discuss this issue again in B Stage. There is no need to connect to NVBoard for now.

Don't know where to start?

You might encounter the following questions:

  • How to access memory correctly based on the PC value?
  • How to place addi instructions in memory?
  • How to end simulation after executing a certain number of instructions?
  • How should the ports of the general-purpose register module be designed?

During the pre-learning stage when setting up the Verilator framework, we have already reminded you: Every detail in the project matters to you. Whenever you feel stuck, it's likely reminding you that there might be something you didn't do well in your previous learning. Instead of asking your peers, you should review the previous lab materials and make your best effort to understand every detail, thus finding the answers to the above questions.

Implement the jalr instruction in NPC.

After implementing the addi and jalr instructions, run the test program with those two instructions that was previously run on Logisim on NPC, and check whether the NPC's results match expectations.

Let the program decide when simulation ends

Previously, we let the simulation environment (C++ code) determine when to end the simulation after executing a certain number of instructions. Or let the NPC continue executing until it enters an expected infinite loop to indicate that the program has finished running. However, these approaches are not very versatile: you need to know in advance how many instructions a program will execute. Is there a way to automatically end the simulation when the program finishes executing?

In fact, NEMU has already provided a good solution: the trap instruction. NEMU implements a special nemutrap instruction, which indicates the end of the client program. Specifically, in RISC-V, NEMU chooses the ebreak instruction to act as the nemutrap instruction. In NPC, we can also implement similar functionality: if the program executes the ebreak instruction, notify the simulation environment to end the simulation.

Implementing this functionality is not difficult. First, you need to add support for the ebreak instruction in NPC. However, to allow NPC to notify the simulation environment when executing the ebreak instruction, you also need to implement an interaction mechanism between RTL code and C++ code. We will use the DPI-C mechanism in SystemVerilog to achieve this interaction.

Try DPI-C mechanism

Read the Verilator manual to find the relevant content about the DPI-C mechanism, and try running the examples provided in the manual.

Implement ebreak via DPI-C

Utilize the DPI-C mechanism in RTL code to notify the simulation environment to end simulation when NPC executes the ebreak instruction. Once implemented, place an ebreak instruction at the location of the halt() function in the above program for testing purposes. If your implementation is correct, the simulation environment no longer needs to worry about when the program ends simulation; it just needs to continue simulation until the program executes the ebreak instruction.

If you are using Chisel, you can utilize the BlackBox mechanism to call Verilog code and then let the Verilog code interact with the simulation environment through the DPI-C mechanism. Please refer to relevant documentation for how to use BlackBox.

Implementing the complete minirv processor

You need to implement the remaining six minirv instructions, including add, lui, lw, lbu, sw, and sb. The first two are integer arithmetic instructions, which are very similar to the add and li instructions in sISA. You have already implemented these two instructions in sISA during the E phase, so this should not be difficult for you.

In order to implement the remaining four memory access instructions, we need to consider some additional factors. Memory access instructions require accessing the memory. Unlike instruction fetching, memory access instructions may also involve writing data into memory. Our previous simplistic implementation of bringing the instruction fetch interface to the top level couldn't correctly handle memory access instructions. This is because the signals for memory access depend on the currently fetched instruction, which the simulation environment cannot handle correctly. To address this issue, we can implement memory access through the DPI-C mechanism:

import "DPI-C" function int pmem_read(input int raddr);
import "DPI-C" function void pmem_write(
  input int waddr, input int wdata, input byte wmask);
reg [31:0] rdata;
always @(*) begin
  if (valid) begin // When there are read or write requests
    rdata = pmem_read(raddr);
    if (wen) begin // When there are write requests
      pmem_write(waddr, wdata, wmask);
    end
  end
  else begin
    rdata = 0;
  end
end
extern "C" int pmem_read(int raddr) {
  // Always read 4 bytes from address `raddr & ~0x3u` and return to `rdata`
}
extern "C" void pmem_write(int waddr, int wdata, char wmask) {
  // Always write `wdata` into 4 bytes aligned to the address `waddr & ~0x3u` according to the write mask `wmask`
  // Each bit in `wmask` acts as a byte-level mask for a byte in wdata
  // For example, `wmask = 0x3` means only write the lowest 2 bytes, leaving the other bytes in memory unchanged
}

These memory read and write functions simulate the behavior of a 32-bit bus: they only support read and write operations aligned to 4 bytes. Read operations always return data aligned to 4 bytes, which needs to be selected by the RTL code based on the read address. This setup ensures that minimal changes are required when implementing the bus in the future. You need to pass the correct parameters to these function calls in Verilog code and implement the functionality of these two functions in C++ code. For instruction fetching, you need to remove the previous implementation of bringing signals to the top level and instead call pmem_read() once extra to implement it.

As with the implementation of minirvEMU, manually initialising memory is inefficient when running larger programs. To improve efficiency, we can have the simulation environment read the program path from the command line and then place the program content in memory.

Implement a complete minirv processor

Add the remaining 6 minirv instructions to the NPC, and update the loader accordingly. Then run the sum and mem programs on the NPC that were previously run on Logisim. To determine if the program has successfully completed execution, you can place an ebreak instruction at the location corresponding to the halt() function in memory before starting the NPC simulation.

Build a runtime environment of AM for minirv.

You can now easily compile programs to riscv32-nemu using the AM project. Similarly, we can also quickly compile programs to minirv-npc in the same way, so that we can test whether the NPC implementation is correct with more programs.

Update AM

We added support for minirv-npc to AM on 2025/05/03 12:30:00. If you obtained the AM code before the above time, you can get the new version with the following command:

cd ysyx-workbench
cp -r abstract-machine/klib .  # Backup the klib implementation. If you have made changes to other parts of AM, please manually backup those as well.
rm -rf abstract-machine
bash init.sh abstract-machine
rm -rf abstract-machine/klib
mv klib abstract-machine       # Restore klib. If you have backed up other parts of AM, please manually restore those as well.

The AM project has already provided the basic framework for minirv-npc. You only need to execute the following command in the am-kernels/tests/cpu-tests/ directory

make ARCH=minirv-npc ALL=xxx

to compile the test named xxx into the minirv-npc runtime environment. However, to ensure compatibility with future device features, minirv-npc specifies that programs must start at 0x80000000. You need to modify the initial value of the PC to comply with this runtime environment specification.

To familiarise ourselves with the process, we will first try running a dummy program in NPC.

Compile and run AM programs on NPC with one command

In the AM project, the Makefile does not provide a run target for minirv-npc. Try to provide a run target for minirv-npc so that typing make ARCH=minirv-npc ALL=dummy run will compile the AM program and run it on NPC. However, currently, the halt() function of minirv-npc is an infinite loop. You can check whether the NPC has successfully entered the halt() function by viewing the waveform.

Implement the `halt()` function in `minirv-npc`

To automatically end the program, you need to implement the halt() function in minirv-npc, where you add an ebreak instruction. After this, when an AM program running on NPC finishes, it will execute the ebreak instruction, signaling the NPC's simulation environment to end the simulation.

Once implemented, you can run AM programs on NPC and automatically end the simulation with a single command.

Implement HIT GOOD/BAD TRAP for NPC

NEMU can output information about whether the program has successfully completed execution. Try implementing similar functionality in NPC. This way, you'll be able to know whether the program has successfully ended on NPC in the future.

Thanks to the completeness of the minirv instruction set, programs that previously ran on riscv32-nemu can be recompiled to minirv-npc and run on NPC. You do not need to implement additional instructions on NPC to run them.

Run more programs on NPC

Run cpu-tests, riscv-tests, and riscv-arch-test on minirv-npc to test whether the NPC implementation is correct.