RTFSC

In fact, the TRM implementation is so simple that the framework code already implements it. Let's take a look at what does the digital circuits that make up the TRM looks like in NEMU's C code. For ease of description, we will refer to the computer simulated in NEMU as the "guest computer" and the program running in NEMU as the " guest program".

Framework code the first look

The framework code is extensive and contains a lot of code that will be used in subsequent phases. As we progress, we will gradually explain all the code. Therefore, when reading the code, you only need to care about the modules that are relevant to the current progress, do not dwell on the code that is not relevant to the current progress, otherwise it will bring unnecessary fear to your mind.

ics2023
├── abstract-machine   # abstract computer
├── am-kernels         # Applications developed based on abstract computers
├── fceux-am           # NES Simulator
├── init.sh            # Initialization Script
├── Makefile           # For project packaging and submission
├── nemu               # NEMU
└── README.md

For now, we only need to be aware of the contents of the NEMU sub-project, the other sub-projects will be introduced in the future. NEMU consists of 4 modules: monitor, CPU, memory, device. We have briefly introduced the CPU and memory functions in the previous subsection, and the devices will be introduced in PA2, so we don't need to care about them now.

The monitor module was introduced to facilitate the monitoring of client computers. In addition to interacting with GNU/Linux (e.g., reading in guest programs), it also functions as a debugger, providing a convenient way to debug NEMU. Conceptually, monitor is not a necessary part of a computer, but it is a necessary infrastructure for NEMU. Without the monitor module, debugging the NEMU would be very difficult.

The source files in the nemu/ directory of the code are organized as follows (not all files are listed).

nemu
├── configs                    # Some pre-provided configuration files
├── include                    # tore header files for global use
│   ├── common.h               # Common headers
│   ├── config                 # Configuration system-generated header file, used to maintain timestamps for configuration option updates
│   ├── cpu
│   │   ├── cpu.h
│   │   ├── decode.h           # Decoding related
│   │   ├── difftest.h
│   │   └── ifetch.h           # Fetch related
│   ├── debug.h                # Some macros for debugging
│   ├── device                 # Device-related
│   ├── difftest-def.h
│   ├── generated
│   │   └── autoconf.h         # Configuration system-generated header file, used to define macros based on configuration information.
│   ├── isa.h                  # ISA-related
│   ├── macro.h                # Some convenient macro definitions
│   ├── memory                 # Memory access related
│   └── utils.h
├── Kconfig                    # Rules for configuring information
├── Makefile                   # Makefile build scripts
├── README.md
├── resource                   # Some supplementary resources
├── scripts                    # Makefile build scripts
│   ├── build.mk
│   ├── config.mk
│   ├── git.mk                 # git version control related
│   └── native.mk
├── src                        # source file
│   ├── cpu
│   │   └── cpu-exec.c         # Main loop for instruction execution
│   ├── device                 # Device Related
│   ├── engine
│   │   └── interpreter        # Interpreter implementation
│   ├── filelist.mk
│   ├── isa                    # ISA-related implementations
│   │   ├── mips32
│   │   ├── riscv32
│   │   ├── riscv64
│   │   └── x86
│   ├── memory                 # Memory Access Implementation
│   ├── monitor
│   │   ├── monitor.c
│   │   └── sdb                # simple debugger
│   │       ├── expr.c         # Expression evaluation implementation
│   │       ├── sdb.c          # Command Processing for the Simple Debugger
│   │       └── watchpoint.c   # Watchpoint implementation
│   ├── nemu-main.c            # You know it....
│   └── utils                  # Some public features
│       ├── log.c              # Log file related
│       ├── rand.c
│       ├── state.c
│       └── timer.c
└── tools                      # Tools
    ├── fixdep                 # Dependency fixes, used in conjunction with the configuration system
    ├── gen-expr
    ├── kconfig                # Configuration system
    ├── kvm-diff
    ├── qemu-diff
    └── spike-diff

In order to support different ISAs, the framework code divides NEMU into two parts: the basic ISA-independent framework and the ISA-related concrete implementation. NEMU puts the ISA-related code in the nemu/src/isa/ directory, and provides the declaration of the ISA-related APIs through nemu/include/isa.h. In this way, the code outside of nemu/src/isa/ shows the basic framework of NEMU. This has two advantages.

  • Helps us recognize what different ISAs have in common: no matter what kind of ISA the client computer is, they all have the same basic framework.
  • Reflects the idea of abstraction: the framework code abstracts the differences between ISAs into APIs, which are called by the base framework so that you don't have to worry about the specifics of the ISA. If you plan to choose a different ISA in the future for the second round of the PA, you'll be able to clearly appreciate the benefits of abstraction: the base framework code doesn't need to be modified at all!

This page organizes the above APIs for future reference, and you don't need to fully understand how they work at this time. "Abstraction" is a very important concept in computer systems, and if you don't understand what it means now, don't worry, you'll come across it again and again in the rest of PA.

Once you have a general understanding of the above repo tree, you are ready to start reading code. Where to start is a no-brainer.

Need more words?

Well... If you think that's not enough of a hint, here's a good one: Recall from your programming class, where does a program begin to execute?

If you don't care to answer that question, calm down. It's a question worth exploring, and you'll revisit it in the future.

Configuration system and building the project

Before we actually start reading the code, let's briefly introduce the configuration system and project building in the NEMU project.

Configuration system kconfig

In a project of a certain size, the number of configurable options can be very large, and there may be a correlation between the options, e.g., if you turn on option A, option B must be a certain value. It is very error prone to let the developer manage these configuration options directly, for example, after modifying option A, the developer may forget to modify option B which is related to option A. The configuration system is designed to solve this problem.

The configuration system in NEMU is located in nemu/tools/kconfig, which is derived from the GNU/Linux project's kconfig, with a few simplifications. kconfig defines a simple language that developers can use to write "configuration files". In a configuration file, a developer can describe.

  • the properties of a configuration option, including type, default value, etc.
  • the relationship between different configuration options
  • The hierarchy of configuration options

In the NEMU project, the name of the configuration file is Kconfig, e.g. nemu/Kconfig. When you type make menuconfig, the following events take place.

  • Checks to see if the program nemu/tools/kconfig/build/mconf exists, and if it doesn't, builds mconf.
  • Check if the program nemu/tools/kconfig/build/conf exists, if not, compile and generate conf.
  • Run the command mconf nemu/Kconfig, then mconf will parse the contents of nemu/Kconfig, and display the configuration options in the form of a menu tree for the developer to select.
  • When exiting the menu, mconf will record the results of the developer's choices in the nemu/.config file.
  • Run the command conf --syncconfig nemu/Kconfig, at which point conf will parse the contents of nemu/Kconfig and nemu/.config (The selected options), combining the two to produce the following file.
    • Macro definitions that can be included in C code (nemu/include/generated/autoconf.h), where the macros have names of the form CONFIG_xxx.
    • Variable definitions that can be included in the Makefile (nemu/include/config/auto.conf).
  • Dependency rules (nemu/include/config/auto.conf.cmd) related to the configuration profile that can be included in the Makefile, and which we don't need to care about in order to read the code.
    • a directory tree nemu/include/config/ that maintains configuration option changes by timestamps, which is used in conjunction with another tool nemu/tools/fixdep to save unnecessary file compilation after updating configuration options, which we don't need to care about in order to read the code

So, for now, we only need to be concerned with configuring the following system-generated files.

  • nemu/include/generated/autoconf.h, used when reading C code.
  • nemu/include/config/auto.conf, used when reading the Makefile.

Project building and Makefile

The NEMU makefile is a little more complex, it has the following features.

Associate with configuration system

Associate makefile with the configuration system by including nemu/include/config/auto.conf, which associates variables generated by kconfig. Therefore the behavior of the Makefile may change after updating the configuration options via menuconfig.

filelist

The filelist determines which source files will be compiled. In nemu/src and its subdirectories there are files called filelist.mk which maintain the following four variables according to the menuconfig configuration.

  • SRCS-y - a candidate set of source files to be involved in compilation
  • SRCS-BLACKLIST-y - the set of blacklisted source files that will not participate in compilation
  • DIRS-y - a collection of directories participating in the compilation, all files in that directory are added to `SRCS-y
  • DIRS-BLACKLIST-y - a collection of directories that do not participate in compilation, all files in this directory will be added to SRCS-BLACKLIST-y

The Makefile will include all the filelist.mk files in the project, and base off the above four variables to filter out the source files that are in SRCS-y but not in SRCS-BLACKLIST-y as the set of source files that will eventually participate in the compilation.

The above four variables can also be associated with boolean options in the configuration result of menuconfig. For exampleDIRS-BLACKLIST-$(CONFIG_TARGET_AM) += src/monitor/sdb, When we select the TARGET_AM related boolean option in menuconfig. kconfig eventually generates code in nemu/include/config/auto.conf in the form of CONFIG_TARGET_AM=y. Expanding the variables will give youDIRS-BLACKLIST-y += src/monitor/sdb; When we do not select the TARGET_AM related boolean option in menuconfig. kconfig will generate code like CONFIG_TARGET_AM=n, or CONFIG_TARGET_AM is not defined. At this point you will getDIRS-BLACKLIST-n += src/monitor/sdb, orDIRS-BLACKLIST- += src/monitor/sdb, Neither case affects the value of DIRS-BLACKLIST-y, resulting in the following effect:

When TARGET_AM is checked in menuconfig, all files in the nemu/src/monitor/sdb directory will not be compiled.
Compile and linking

Makefile compilation rules are defined in nemu/scripts/build.mk:

$(OBJ_DIR)/%.o: %.c
  @echo + CC $<
  @mkdir -p $(dir $@)
  @$(CC) $(CFLAGS) -c -o $@ $<
  $(call call_fixdep, $(@:.o=.d), $@)

The meaning of the symbols $@ and $< can be found in RTFM. The call to call_fixdep is used to generate more meaningful dependencies, but for the time being we're mainly concerned with compiling commands, so we can ignore call_fixdep for now.

We can start by looking at what commands are run during the make process, and then work backwards to understand the values of variables such as $(CFLAGS). To do this, we can type make -nB, which will force the make program to build the target in a way that "outputs commands but does not execute them". After running it, you can see a lot of things like

gcc -O2 -MMD -Wall -Werror -I/home/user/ics2023/nemu/include
-I/home/user/ics2023/nemu/src/engine/interpreter -I/home/use
r/ics2023/nemu/src/isa/riscv32/include -O2    -D__GUEST_ISA__
=riscv32  -c -o /home/user/ics2023/nemu/build/obj-riscv32-nem
u-interpreter/src/utils/timer.o src/utils/timer.c

so you can easily understand the values of the above Makefile variables:

$(CC) -> gcc
$@ -> /home/user/ics2023/nemu/build/obj-riscv32-nemu-interpreter/src/utils/timer.o
$< -> src/utils/timer.c
$(CFLAGS) -> the rest of it

So you can backtrack how the value of $(CFLAGS) was formed based on the above output and the Makefile. Since the commands for compiling each file are similar, once you understand the compilation of one source file, you can extrapolate to the compilation of other source files. Similarly, you can understand the final linking command in the same way as above.

Prepare first guest program

As we already know, NEMU is a program that executes a guest program, but the guest program doesn't exist on the guest computer at first. We need to read the guest program into the guest computer, which is the responsibility of the monitor. So when NEMU starts running, it first calls the init_monitor() function (defined in nemu/src/monitor/monitor.c) to do some initialization work related to the monitor.

kconfig generated macros and conditional compilation

As we mentioned above, kconfig will define some macros like CONFIG_xxx in nemu/include/generated/autoconf.h according to the result of the selected configuration options, we can use these macros in C code to conditionally compile or not compile certain code. For example, when the CONFIG_DEVICE macro is not defined, the device-related code does not need to be compiled.

In order to write more compact code, we define macros in nemu/include/macro.h that are specifically designed to check macros definitions. For example, IFDEF(CONFIG_DEVICE, init_device()); means that the init_device() function will be called only if CONFIG_DEVICE is defined, and MUXDEF(CONFIG_TRACE, "ON", "OFF") means that, if CONFIG_TRACE is defined, the result of preprocessing is "ON" ("OFF" disappears after preprocessing), otherwise the result of preprocessing is "OFF".

These macros are amazing, do you know how they work?

Why are they all functions?

Read the code of the init_monitor() function, and you will see that it is all function calls. The code should behave the same if function body is expanded in init_monitor(). In contrast, what is the advantage of using functions here?

Let's shed some light on these initializations. parse_args(), init_rand(), init_log() and init_mem() are not very esoteric, just RTFSC.

Argument parsing

The parse_args() calls a function you may not be familiar with, getopt_long(), which is used by the framework code to parse the arguments, see man 3 getopt_long for details on the behavior.

Argument processing

Another question is, where do the args come from?

Next monitor calls the init_isa() function (defined in nemu/src/isa/$ISA/init.c) to do some ISA-related initialization.

The first job is to read a built-in guest program into memory. In order to understand this task, we need to clarify three questions.

  1. What is a guest program? We know that programs are made up of instructions, which vary from one ISA to another (imagine "hello" in a different language), so the program itself must be ISA-related. Therefore, we put the built-in guest program in nemu/src/isa/$ISA/init.c. The behavior of the built-in guest program is very simple, it contains only a few instructions, and it doesn't even do anything meaningful.
  1. What is memory? We can think of memory as a contiguous piece of storage space, and since memory is byte-addressed (i.e., one memory slot holds one byte of data), it is natural to use an array of type uint8_t in C to model memory. NEMU provides 128MB of physical memory for the guest computer by default (see pmem defined in nemu/src/memory/paddr.c)
  1. Where in memory does the guest program need to be read into? In order for the guest computer's CPU to execute the guest program, we need a way to let the guest computer's CPU know where the guest program is located. We do this in the simplest way possible: by convention. Specifically, we have the monitor read the guest program directly into a fixed memory location RESET_VECTOR. The value of RESET_VECTOR is defined in nemu/include/memory/paddr.h.

BIOS and computer booting

We know that memory is a kind of RAM, a kind of volatile storage medium, which means that when the computer first starts up, the data in the memory is meaningless; whereas the BIOS is solidified in ROM/Flash, which are non-volatile storage mediums, and the contents of the BIOS will not be lost in the event of a power failure.

Therefore, in a real computer system, after the computer starts up, the computer will first hand over the control to the BIOS, the BIOS perform a series of initialization work, and then read the meaningful program from the disk into the memory to execute. The simulation of this process requires a lot of details beyond the scope of this course, which we have simplified in PA by adopting the convention that the CPU starts executing directly from the agreed upon memory location.

A first look at operating system booting sequence

When you are using Windows, the boot process is usually accompanied by a boot animation, and then somehow you get to the login screen, which is obviously not enough to satisfy the CSer's curiosity. In fact, in GNU/Linux, you can easily find out what the operating system is doing behind the scenes. By typing sudo dmesg, you can output the operating system's boot logs, and see what the operating system is doing at a glance.

However, your current knowledge may not allow you to understand what is going on. But don't let that discourage you, as later in the PA you will be running a small operating system, Nanos-lite, on NEMU. Although Nanos-lite is a drop in the ocean compared to GNU/Linux, you will still be able to fully understand some of the key steps in the operating system booting sequence, and the door to the operating system will be open for you.

The second task of init_isa() is to initialize the registers, which is done by the restart() function. Registers are a highly structured part of the CPU, and it is natural to use the corresponding structures in C to describe the register structure of the CPU. The register structure varies from ISA to ISA, so we define the register structure CPU_state in nemu/src/isa/$ISA/include/isa-def.h, and define a global variable cpu in nemu/src/cpu/cpu-exec.c. An important part of initializing the registers is setting the initial value of cpu.pc, which we need to set to the memory location where we just loaded the guest program, so that the CPU can start executing the guest program from the memory location we agreed upon. For mips32 and riscv32, their register 0 always holds 0, so we need to initialize it as well.

Starting address of physical memory

The physical memory of the x86 is addressed from 0, but this is not the case for some ISAs, e.g. the physical address of the mips32 and riscv32 starts at 0x80000000. So for mips32 and riscv32, their CONFIG_MBASE will be defined as 0x80000000. When the CPU accesses memory in the future, we will map the memory address that the CPU will access to the corresponding offset in pmem, which is realized by the guest_to_host() function in nemu/src/memory/paddr.c. For example if the mips32 CPU intends to access memory address 0x80000000, we'll make it end up accessing pmem[0] so that it can correctly access the first instruction of the guest program. This mechanism has a special name, address mapping, and we'll encounter it again in subsequent PAs.

For x86, we take the implementation of the register structure as a homework assignment. To check that your implementation is correct, we also call the reg_test() function (defined in nemu/src/isa/x86/reg.c) in init_isa(). The details are described in the mandatory questions below.

After Monitor reads in the guest program and initializes the registers, the memory layout is as follows:

pmem:

CONFIG_MBASE      RESET_VECTOR
      |                 |
      v                 v
      -----------------------------------------------
      |                 |                  |
      |                 |    guest prog    |
      |                 |                  |
      -----------------------------------------------
                        ^
                        |
                       pc

NEMU returns to the init_monitor() function, and proceeds to call the load_img() function (defined in nemu/src/monitor/monitor.c). This function reads a meaningful guest program from disk imageopen in new window into memory, overwriting the built-in guest program that was just there. This disk image file is an optional parameter for running NEMU, and is specified in the command to run NEMU. If this parameter is not given when running NEMU, NEMU will run the built-in guest program.

The rest of the initialization of monitor will be described in the following labs, but for now you don't need to worry about their details, finally monitor will call the welcome() function to output the welcome message. Now you can compile and run NEMU in the nemu/ directory:

make run

Implementing Register Structures for x86

If you chose x86, the framework code does not correctly implement the structure x86_CPU_state which is used to emulate the x86 registers, now you need to implement it (the structure is defined in nemu/src/isa/x86/include/isa-def.h). The reg_test() function called in init_isa() generates some random data to test the implementation of the register structure. If the implementation is incorrect, an assertion fail will be triggered. When the implementation is correct, NEMU will not trigger an assertion fail, but will output the welcome message mentioned above. If you have chosen an ISA other than x86, you can ignore this question.

The x86 register structure is as follows:

 31                23                15                7               0
+-----------------+-----------------+-----------------+-----------------+
|                                  EAX       AH       AX      AL        |
|-----------------+-----------------+-----------------+-----------------|
|                                  EDX       DH       DX      DL        |
|-----------------+-----------------+-----------------+-----------------|
|                                  ECX       CH       CX      CL        |
|-----------------+-----------------+-----------------+-----------------|
|                                  EBX       BH       BX      BL        |
|-----------------+-----------------+-----------------+-----------------|
|                                  EBP                BP                |
|-----------------+-----------------+-----------------+-----------------|
|                                  ESI                SI                |
|-----------------+-----------------+-----------------+-----------------|
|                                  EDI                DI                |
|-----------------+-----------------+-----------------+-----------------|
|                                  ESP                SP                |
+-----------------+-----------------+-----------------+-----------------+

where

  • EAX, EDX, ECX, EBX, EBP, ESI, EDI, ESPare 32-bit registers.
  • AX, DX, CX, BX, BP, SI, DI, SPare 16-bit registers.
  • AL, DL, CL, BL, AH, DH, CH, BHare 8-bit registers.

But they are not physically independent of each other, for example, the lower 16 bits of EAX are AX, and AX is divided into AH and AL. This structure is sometimes convenient when dealing with data of different lengths. For more details on x86 registers, see RTFM.

Hint: Using anonymous unions.

What is an anonymous union?

It's normal for you to have this question, but you should realize that it's time to STFW.

How does reg\_test() test your implementation?

Read the code for reg_test(), and think about what the assert() condition in the code is written against.

After running NEMU you should see the appropriate welcome message, and the ISA of your choice. Be sure to make sure that the output ISA information matches the ISA of your choice. However, you will see the following error message.

[src/monitor/monitor.c:20 welcome] Exercise: Please remove me in the source code and compile NEMU again.
riscv32-nemu-interpreter: src/monitor/monitor.c:21: welcome: Assertion `0' failed.

In fact, we've already covered this error at the end of PA0. As an exercise, you need to backtrack to the code that reported the error based on the error message, and then remove the corresponding code. After removing it and recompiling NEMU, you will see that the error no longer occurs.

Run the first guest program

After Monitor's initialization, the main() function will continue to call the engine_start() function (defined in nemu/src/engine/interpreter/init.c). The code enters the Simple Debugger's main loop sdb_mainloop() (defined in nemu/src/monitor/sdb/sdb.c), and outputs the NEMU command prompt.

(nemu)

Simple debugger is the core feature of monitor, we can enter commands in the command prompt, to monitor and debug the operation status of the guest computer. The framework code already implements a few simple commands, which are very similar to GDB.

After typing c at the command prompt, NEMU enters the main loop cpu_exec() (defined in nemu/src/cpu/cpu-exec.c). cpu_exec() in turn calls execute(), which simulates the way the CPU works: executing instructions over and over again. Specifically, the code calls the exec_once() function in a for loop, which does what we described in the previous chapter: it tells the CPU to execute an instruction pointed to by the current PC, and then update the PC.

For how long?

In the cmd_c() function, cpu_exec() is called with the argument -1, do you know what this means?

Potential Threats (thinking recommended for 2nd round)

"The call to cpu_exec() was passed with the argument -1", is this undefined behavior? Check the C99 manual to confirm your idea.

Different ISAs have different instruction formats and meanings, so the code that executes the instructions is naturally ISA-related. This code is located in nemu/src/isa/$ISA/inst.c. There are a lot of details about the execution of the instructions that you don't need to worry about at the moment, we will explain them in PA2.

Since we ran NEMU without a guest program image, NEMU will run the built-in guest program mentioned above. NEMU will keep executing commands until one of the following conditions is encountered, then it will exit the command execution loop.

  • The required number of cycles has been reached.
  • The guest program executes the nemu_trap instruction. This is a fictitious special instruction that was added to NEMU to allow the guest program to indicate the end of execution. NEMU has selected a number of instructions for debugging purposes in the ISA manual and given them the special meaning of nemu_trap. For example, in the riscv32 manual, NEMU chose the ebreak instruction to act as nemu_trap. The nemu_trap instruction also receives an argument indicating the end state of the guest program in order to indicate whether the guest program has ended successfully or not. After the guest program executes this instruction, NEMU will set the end state of NEMU according to this end state parameter, and output different end messages according to different states, mainly including
    • HIT GOOD TRAP - the guest program ends execution correctly.
    • HIT BAD TRAP - the guest program ended execution incorrectly.
    • ABORT - the guest program terminated unexpectedly, and did not finishs its execution

When you see NEMU output something like the following (pc output values will vary from ISA to ISA).

nemu: HIT GOOD TRAP at pc = 0x8000000c

The guest program has successfully ended its run. NEMU prints the number of instructions executed and the time spent at the end of the cpu_exec() function, and calculates the frequency of instruction execution. However, since the built-in guest program is so small and the execution ends so quickly, it is not possible to compute a meaningful frequency at this time. In the future, the frequency output here can be used as a rough measure of NEMU's performance when running some complex programs.

After exiting cpu_exec(), NEMU will return to sdb_mainloop(), waiting for the user to enter commands. But in order to run the program again, you need to type q to exit NEMU, and then run it again.

Who is responsible for indicates the end of a program?

You were told in your programming class that a program exits when it gets to the point where the main() function returns, and you believed it. But have you ever wondered why program execution ends at the return of the main() function? If someone tells you that the teacher in your programming class is wrong, do you have a way to prove/disprove it? If you are interested in this, please search the internet.

Let's just finish what we started (recommended to think about it in 2nd round)

What is the beginning of a program on GNU/Linux? What is the end of a program on GNU/Linux? What is the answer for a program running in NEMU?

Related questions: Why do we need nemu_trap in NEMU? Why do we need a monitor?

Finally, let's talk about some noteworthy aspects of the code.

  • Three macros useful for debugging (defined in nemu/include/debug.h)
    • Log() is an updated version of printf(), designed to output debugging information, as well as the source file, line number, and function where Log() was used. When too much debugging information in output, it is easy to locate the relevant place in the code.
    • Assert() is an updated version of assert() that outputs some information before the assertion fails when the test condition is false.
    • panic() is used to output information and end the program, equivalent to an unconditional assertion fail.

Examples of using these three macros are given in the code, and if you don't know how to use them, RTFSC.

  • Memory is simulated by the large array pmem defined in nemu/src/memory/paddr.c. The simulated memory is always accessed by vaddr_read() and vaddr_write() (defined in nemu/src/memory/paddr.c) while the guest program is running. vaddr, paddr represent virtual and physical addresses respectively. These concepts will be used in the future, so there's no need to go into them now, but keeping the interfaces consistent from now on will avoid some unnecessary trouble in the future.

Understanding Framework Code

If you don't know what considered "understand the framework code", you can try the following tasks first. If you find that you don't know how to do it, come back and read this page more carefully. Understanding the framework code is a spiral process, with different focuses at different stages. You don't need to get frustrated because you don't understand some details, and you shouldn't try to understand all the code at once.

RTFSC != Staring at the code

It is likely that this is the first time you have worked on a project with so many source files, and looking at the code can be very confusing: you don't know in which file a function is defined, you don't understand what a function does, you don't know exactly how a piece of code behaves... At the same time, you are probably looking at the code in the same primitive way you used to look at it: with your eyes. You keep looking at it for a while, and you realize that you're not getting anywhere, so you start to get frustrated...

For projects with only one or two source files and a few hundred lines of code, direct RTFSC can be effective. But if there are more source files, more code, you will soon find that this approach is very inefficient, this is because the short-term memory of the human brain is very limited, even static code can not be completely remembered, not to mention the dynamic behavior of the program is a huge state machine, your brain can only simulate a very small portion of the state machine in a period of time.

Is there a tool that can help you simulate this huge state machine? This is where one of the tools we mentioned in PA0 comes in handy, GDB. In GDB, we can make the program execute one instruction at a time by single-stepping it, which is equivalent to letting the state machine move forward one step at a time, so we can observe the state of the program at any point in time! And the state machine's progress is the true order of execution of the program, so you can understand the behavior of the program as you run it. This is good for pointer-related code, especially function pointers, because you can't tell from the static code which function the pointer is pointing to when the program is running.

GDB also comes with a simple interface called TUI. After running GDB in a window with enough height, you can switch to the TUI by typing layout split, which allows you to see the behavior of the program from both the source code and the command perspective. However, in order to see the source code, you will need to add GDB debugging information to the NEMU build, as described in the box below. If you want to know more about TUI, STFW.

To help you RTFSC more efficiently, you'd better get to know more about GDB commands and tricks through RTFM and STFW, such as.

  • Step into into functions you are interested in
  • Step over functions you're not interested in (e.g. library functions)
  • Run to the end of the function
  • Print the value of a variable or register
  • Scanning memory
  • View the call stack
  • Set breakpoints
  • Setting watchpoints

If you haven't used GDB before, and then skipped GDB-related stuff in PA0, now you're going to suffer the consequences of laziness.

Adding GDB debugging information for NEMU compilation

Menuconfig already has the appropriate options for you, you just need to turn it on.

Build Options
  [*] Enable debug information

Then clear the compilation and recompile. Try to read the code and understand what happens when you turn on the menuconfig option to compile NEMU.

exit gracefully

To test if you understand the framework code, we'll set up an exercise for you: if you type q to exit directly after running NEMU, you'll see some error messages output from the terminal. Please analyze what causes this error message and try to fix it in NEMU.

It's that simple

In fact, the implementation of TRM is embedded in the above introduction.

  • The memory is a large array defined in nemu/src/memory/paddr.c.
  • PCs and general-purpose registers are defined in structures in nemu/src/isa/$ISA/include/isa-def.h
  • The adder is defined in... Well, this part of the framework code is a bit complicated, but it doesn't affect our understanding of TRM, so let's go over it in PA2!
  • The way TRM works is reflected in cpu_exec() and exec_once().

In NEMU, we only need some basic knowledge of C to understand the way the simplest computers work, thanks to the pioneers.