E4 From C Code to Binary Program
The online site Compiler Explorer is great for developing programs with a single source file, but it becomes inconvenient when developing programs with multiple source files, or when needing to coordinate multiple programs to complete more complex tasks. After installing the Linux system, we will migrate our development environment to Linux and utilize some useful tools available in Linux to aid in advanced learning of the C language, laying a foundation for setting up the development environment for subsequent tasks.
Advanced learning of C Language in Linux:
In Linux, complete exercises 0-22, 24-25, and 27-33 from Learn C the Hard Way. You need to copy the sample code into Linux to compile and run it, combining the text RTFSC to understand the sample code and complete the corresponding additional questions.
Note that, the exercises mentioned above are part of the Entrance Interview. If you get stuck on certain questions, you can STFW or ask in the Pre-Study group chat.
Although the C language topics in this tutorial are similar to those in the previous “One-stop Learning for Linux C Programming” tutorial, it also includes introductions to various tools like Makefile and Valgrind, as well as some programming concepts worth learning. These will be very helpful for everyone in future project development work and are definitely worth practicing on Linux. In fact, if your foundational knowledge of C is solid enough, you should not spend too much time reading the code; rather, if you encounter many difficulties while reading the code, you should appreciate this opportunity to help you identify gaps in your understanding.
After becoming familiar with C language development in the Linux system, you already know, that programs can only run after the C code is compiled into a binary executable file. So, how does the compiler transform C code into an executable file? To answer this question, we will use some tools available in Linux to further understand the steps involved in this process.
Preprocessing
Preprocessing is a step that occurs before the actual compilation, and its essence is text processing. Preprocessing mainly includes the following tasks:
- Including header files
- Macro substitution
- Removing comments
- Joining the strings split by line continuation characters (
\
at the end of a line) - Handling conditional compilation
#ifdef
/#else
/#endif
- Handling the stringification operator
#
- Handling the identifier concatenation operator
##
.
For example, here is the following C code:
// a.c
#include <stdio.h>
#define MSG "Hello \
World!\n"
#define _str(x) #x
#define _concat(a, b) a##b
int main() {
printf(MSG /* "hi!\n" */);
#ifdef __riscv
printf("Hello RISC-V!\n");
#endif
_concat(pr, intf)(_str(RISC-V));
return 0;
}
You can preprocess the above C code by using this command:
gcc -E a.c
Observe the preprocessing results
Try running the above gcc
command, then compare the preprocessing results with the differences in the source file.
A question worth discussing is: how does gcc
find header files? To answer this question, we can refer to the tool’s logs and relevant manuals.
How to find header files
- Try executing
gcc -E a.c --verbose > /dev/null
,and look for header file-related content in the output results. - Search for the description related to the
-I
option inman gcc
and read it.
After understanding, try creating some stdio.h
files, then use the -I
option to let gcc
include your created stdio.h
instead of the one in the standard library. Use the -E
option to check if the preprocessing results meet your expectations.
Observe the preprocessing results(2)
Try installing gcc
targeted for the RISC-V architecture and use it for preprocessing:
apt-get install g++-riscv64-linux-gnu
riscv64-linux-gnu-gcc -E a.c
Check the preprocessing results at this point. What new changes do you notice?
The macro __riscv
in the above C code is quite special, as it is predefined by riscv64-linux-gnu-gcc
. Therefore, even if it is not directly defined in the C code, riscv64-linux-gnu-gcc
still processes it as defined.
You can view all predefined macros with the following command:
echo | gcc -dM -E - | sort
This command allows gcc
to preprocess an empty file and then print all the defined macros during this process, sorting them in the process.
Compare the predefined macros of gcc and riscv64-linux-gnu-gcc
Try to compare the predefined macros of gcc
and riscv64-linux-gnu-gcc
to understand the differences between the two during preprocessing. You only need to have a basic understanding of these differences; there’s no need to delve deeply into the specific meanings of each macro.
Hint:
- Using
diff
or related commands can help you quickly identify the differences between the two files. - If you want to understand the meaning of some macros, you can refer to the
gcc
manual.
Compilation
In a broad sense, compilation is the process of converting one language into another. For a C language compiler, compilation is the process of converting C language into target language, where the target language is related to the ISA, and typically refers to the assembly language of the target ISA. For example, for a computer with an x86 architecture, gcc
will convert C language into x86 assembly language; while riscv64-linux-gnu-gcc
will convert C language into riscv64
assembly language.
This process involves many details, and we will not delve into how each step is implemented. However, we will use appropriate tools to understand what each step does, thereby establishing a basic understanding of the compilation process.
To do this, you need to install clang
, which is functionally equivalent to gcc
and is also a C language compiler.
apt-get install clang
At the same time, we will observe the steps in the compilation process through the following program:
// a.c
#include <stdio.h>
int main() { // compute 10 + 20
int x = 10, y = 20;
int z = x + y;
printf("z = %d\n", z);
return 0;
}
Understand the compilation process
Try to refer to man clang
and read the section that introduces the compilation stages to gain a general understanding of the compilation process.
Lexical Analysis
The purpose of lexical analysis is to recognize and record each token in the source file, including identifiers, keywords, constants, strings, operators, braces, semicolons, etc. If an illegal token (e.g., @
) is encountered, an error is reported.
You can view the results of lexical analysis using the following command:
clang -fsyntax-only -Xclang -dump-tokens a.c
You will see that the results of lexical analysis also record the position of each token in the format filename:line number:column number
.
In fact, a C language source file is essentially a text file, so it can also be viewed as a string, and the lexical analysis tool can be considered as a string matching program.
Lexical Analysis and Syntax Highlighting
You may have used the syntax highlighting feature in your editor. In fact, this feature is not difficult to implement: according to the definitions in the ANSI C, you can write a simple lexical analysis tool to recognize certain keywords in C code and output them in different colors. If you want to output to the terminal, you can use the ANSI escape code color feature.
Syntax Analysis
The task of syntax analysis is to organize the recognized tokens into a tree structure according to the syntax of the C language, thereby clarifying the hierarchical structure of the source program, from files and functions to statements, expressions, and variables. If a syntax error is encountered (such as a missing semicolon), an error is reported. The result of syntax analysis is usually presented in the form of an Abstract Syntax Tree (AST).
You can view the results of syntax analysis using the following command:
clang -fsyntax-only -Xclang -ast-dump a.c
Semantic Analysis
The task of syntax analysis is to determine the type of each expression in the AST according to the semantics of the C language. In this process, compatible types will undergo type conversion based on the C language standard (such as arithmetic type promotion). For cases that do not conform to semantics, errors will be reported. Some situations that are syntactically correct but semantically incorrect include undefined references, mismatched operand types for operators (e.g., struct mytype a; int b = a + 1;
), and mismatches in the type and number of function call parameters.
For clang
, the types of expressions are already provided when outputting the AST. In fact, most compilers do not strictly separate syntax analysis and semantic analysis.
An important application of semantic analysis is static program analysis, which refers to analyzing source code without executing the program. It essentially analyzes the semantic information in the AST. The perspectives of analysis include code style and conventions, potential software defects, security vulnerabilities, performance issues, etc.
For example, consider the following program:
// a.c
#include <stdlib.h>
int main() {
int *p = malloc(sizeof(*p) * 10);
free(p);
*p = 0;
return 0;
}
This program conforms to the syntax of the C language, and each statement, when viewed in isolation, also conforms to the semantics of the C language. Compiling it directly with gcc a.c
, does not produce any errors, and the compiled program even runs successfully. However, if you add the compilation option -Wall
, gcc
will perform more checks on the code and will issue a warning indicating that there is a use-after-free
issue in the code, which means that access is attempted after the memory has been freed. Tools that report potential issues in code through static program analysis are called lint tools.
Similarly, you can use clang
’s lint tool to analyze the above program:
clang a.c --analyze -Xanalyzer -analyzer-output=text
Emphasize the Importance of Lint Tools
Some beginner programmers may feel that having the compiler report more warnings adds extra workload to programming. In fact, using lint tools is almost cost-free for developers and can help them discover many potential issues. Once these issues enter the runtime phase of a program, developers will pay a higher price to debug them. Especially during the operation of large projects, problems caused by such code can be very subtle, and software may crash suddenly after running for a long time, making debugging very difficult. Therefore, large projects typically make full use of lint tools to improve project quality as much as possible.
Intermediate Code Generation
Intermediate code is a type of ISA defined by the compiler for compilation scenarios, also known as Intermediate Representation (IR) or Intermediate Language.
You can view the intermediate code generated by clang
with the following command:
clang -S -emit-llvm a.c
cat a.s
Reviewing the state machine model, the main task of compilation is to translate the state machine of a C program into the state machine of the ISA, meaning translating the variables of the C program into registers or memory, and translating the statements of the C program into sequences of instructions. Since intermediate code can also be seen as a type of ISA, we can understand the work of intermediate code generation from a state machine perspective:
- translating the variables of the C program into variables of the intermediate code, such as
%1
,%2
,%3
, etc.; - translating the statements of the C program into instructions of the intermediate code, including
alloca
,store
,load
,add
,call
,ret
, etc.
Of course, the translation process needs to rely on the semantics analyzed by the AST to ensure that the behavior of the resulting intermediate code is equivalent to that of the input C program.
So why not translate directly to the target language, which is the processor-related ISA?
On one hand, there are many processor-related ISAs. If we directly translate to a processor-related ISA and need to optimize the program, we would have to implement an optimization technique separately for different ISAs, increasing the maintenance costs of the compiler. If we first translate to intermediate code and then to the processor-related ISA, we only need to implement the optimization technique on the intermediate code.
On the other hand, intermediate code can serve as a bridge between various source languages (such as C, Fortran, Haskell, etc.) and various target languages (such as x86, ARM, RISC-V, etc.). If there are source languages and target languages, directly translating to the target language would require implementing translation modules. By introducing intermediate code and dividing the compilation process into frontend and backend using intermediate code as the boundary, we only need to implement translation modules, where frontend modules are responsible for translating the source languages into intermediate code, and backend modules are responsible for translating the intermediate code into the target languages.
frontend backend
+----------+ +------------+
C -> | Clang | -+ +-> | llvm-x86 | -> x86
+----------+ | | +------------+
+----------+ +-> +----------+ -+ +------------+
Fortran -> | llvm-gcc | ---> | llvm-opt | ---> | llvm-arm | -> ARM
+----------+ +-> +----------+ -+ +------------+
+----------+ | | +------------+
Haskell -> | GHC | -+ +-> | llvm-riscv | -> RISC-V
+----------+ LLVM IR LLRM IR +------------+
Different compilers may use different intermediate codes. For example, the intermediate code used by clang
is called LLVM IR
, while the intermediate code used by gcc
is called GIMPLE
. However, we do not need to understand the specific details of the intermediate code; we only need to have a general understanding of the intermediate code generation process from the perspective of the state machine model.
Compilation Optimization
Compilation optimization is an important step in the modern software build process. Through compilation optimization, developers can focus their efforts on the development of the program’s business logic without having to overly consider the program’s performance during the development phase. The compiler can usually provide a decent performance baseline.
Definition of the Correctness of Compilation Optimization
We can understand compilation optimization from the perspective of program behavior: if two programs are “equivalent” in some sense, then a “simple” program can replace a “complex” one. The behavior that follows the C language standard and executes statements one by one is referred to as “strict execution”. Using “strict execution” as a baseline, the C language standard provides a rigorous definition of what is meant by “equivalent”. Specifically, the optimized program should maintain consistency in terms of “observable behavior” (C99 Standard Manual Section 5.1.2.3, Point 6), which includes the following:
- Access to variables modified by the keyword
volatile
, must be strictly executed. - Data written to files at the program’s end, must be consistent with that in strict execution.
- Input and output of interactive devices (stdio.h), must be consistent with that in strict execution.
“Observable behavior” describes the effect of a C program on the outside world from an external perspective. For example, the second point requires that external operations without real-time requirements appear “consistent” at the end, while the third point requires that real-time external operations appear “consistent” during execution. The first point constrains the internal behavior of the C program, which is related to the “memory-mapped I/O” mentioned in the F phase; however, we do not intend to elaborate on this here, as we will continue to discuss related content in the D phase.
Therefore, as long as the optimization continues to meet the consistency of observable program behavior, such optimization is considered “correct”. Under this condition, if the optimized program has fewer variables or statements, one can expect better performance.
Examples of Compilation Optimization Techniques
Real-world projects commonly use compilation optimization techniques, and in the future, you will run various compiled and optimized programs on processors you design yourself. Therefore, understanding some common optimization techniques and knowing why compilers generate corresponding instruction sequences will be beneficial for tasks such as debugging and architecture optimization in the future.
Below are some common compilation optimization techniques. Note that optimization techniques are not directly applied to C code in the compilation process, but for the sake of understanding, we present the semantics before and after optimization using C code.
- Constant Propagation - If the value of a variable is a constant, that value can be substituted into its references. If the substitution results in a constant expression, the value of that expression can be computed directly. In the following example, since the value of
a
is constant, the expressiona + 2
is also constant, makingb
’s value constant as well; further, the expressionb * 3
is also constant. The compiler can compute these constant expressions directly and replace them with their computed results, thus eliminating the need to generate corresponding instructions (like addition and multiplication instructions) to calculate these expressions at runtime.
// Before | After
int a = 1; | int a = 1;
int b = a + 2; | int b = 3;
printf("%d\n", b * 3); | printf("%d\n", 9);
- Dead Code Elimination - Code that is unreachable or variables that are no longer used can be removed. In the following example, since the definition of the
DEBUG
macro is0
, the code inside theif
block will never execute and can be removed. After removing the code in theif
block, the variablea
is unused and can be further removed.
// Before | After
#define DEBUG 0 | #define DEBUG 0
int fun(int x) { | int fun(int x) {
int a = x + 3; | return x / 2;
if (DEBUG) { | }
printf("a = %d\n", a); |
} |
return x / 2; |
} |
- Elimination of Redundant Operations - Assignments that are overwritten without being read can be removed. In the following example, the assignment
a = 3
will be overwritten bya = f()
, so the former can be removed; similarly, the assignmentsa = f()
anda = 7
can also be removed.
// Before | After
int a; | int a;
a = 3; | f();
a = f(); | a = 10;
a = 7; |
a = 10; |
Can it be further optimized?
The optimization result of the previous example retains the function call f()
. Is it possible to further remove f()
? Why?
- Code Strength Reduction - Replace complex operations with simpler arithmetic operations. In the following example, the behavior of
i * 4
andi << 2
is semantically equivalent when defined. However, on most computers, the execution cost of multiplication instructions is higher than that of shift instructions. By replacing the former with the latter, we can improve the program’s performance.
// Before | After
int x = a[i * 4]; | int x = a[i << 2];
- Extract Common Subexpressions - For subexpressions that are computed multiple times, an intermediate variable can be used to store the result, allowing that result to be referenced directly in subsequent code without repeating the computation. In the following example,
a * b
appears twice, and by introducing an intermediate variabletemp
to hold the result ofa * b
, we can reduce one multiplication operation, thereby enhancing the program’s performance.
// Before | After
int x = a * b - 1; | int temp = a * b;
int y = a * b * 2; | int x = temp - 1;
| int y = temp * 2;
- Loop-Invariant Code Motion - Code that produces the same result in each iteration of a loop can be moved outside the loop to be computed once. In the following example, the expression
a + 2
yields the same result in each iteration of the loop, so it can be moved outside the loop for a single computation, avoiding the need to recompute it during each iteration.
// Before | After
int a = f1(); | int x = f1() + 2;
for (i = 0; i < 10; i ++) { | for (i = 0; i < 10; i ++) {
int x = a + 2; | int y = f2(x);
int y = f2(x); | sum += y + i;
sum += y + i; | }
} |
Can it be further optimized? (2)
In the optimization result of the previous example, f2(x)
is still in the loop. Can f2(x)
be moved outside the loop for computation? Why?
- Inline Function - For smaller functions, they can be expanded at the call site to save the overhead of a function call. In the following example,
f2()
‘s call tof1(x, 3)
can be directly expanded to computex + 3
, eliminating the need for the function call and return process, thus improving the program’s performance.
// Before | After
int f1(int x, int y) { | int f1(int x, int y) {
return x + y; | return x + y;
} | }
int f2(int x) { | int f2(int x) {
return f1(x, 3); | return x + 3;
} | }
Can it be further optimized? (3)
In the previous example, assuming f1(x, 3)
is the only call to f1()
within this source file, can we apply dead code elimination techniques to remove f1()
in the optimization result? Why?
In addition to the techniques mentioned above, there are many other compilation optimization techniques, such as induction variable analysis, loop unrolling, software pipelining, automatic parallelization, aliasing, and pointer analysis, which we will not discuss here. Interested individuals can refer to relevant materials.
Enable Compilation Optimization
You can enable more compilation optimizations by providing the -O1
option in the clang
command line:
clang -S -emit-llvm -O1 a.c
cat a.s
Compare the results of compilation optimization
Try to compare the intermediate code generated before and after adding -O1
. What differences do you notice in the intermediate code after adding -O1
?
Compare the results of compilation optimization (2)
Try adding the keyword volatile
before int z = x + y;
and regenerate the -O1
intermediate code. How does the intermediate code compare to the one generated previously?
Compilers usually provide different optimization levels, allowing developers to choose among various metrics such as program performance, code size, and compilation time. For example, in gcc
, the optimization levels for program performance are as follows:
-Ofast
> -O3
> -O2
> -O1
> -Og
> -O0
(default).
The higher the optimization level, the better the performance of the generated program, but the longer the compilation time.
Most software projects that use gcc
or clang
typically use the -O2
optimization level, which allows the software to run with good performance. For -O3
, gcc
will also try to achieve higher program performance by generating more code. -Ofast
is more aggressive: it may even employ optimization strategies that violate language standards to achieve better performance. -Og
only adopts optimization strategies that are friendly for debugging and, compared to -O0
, can enhance program performance while allowing the optimized program to maintain its original structure (like loops, function calls, etc.), making the generated instruction sequence correspond well to the C code, which facilitates debugging for developers.
In addition to program performance, gcc
also provides optimization levels aimed at reducing code size:
-Oz
> -Os
> -O1
> -O0
(default). For clang
, the aforementioned optimization levels are also applicable.
An optimization level typically includes many optimization techniques. For gcc
, you can view the optimization techniques enabled by a specific optimization level using -Q --help=optimizers
.
For example:
gcc -Q --help=optimizers -O1
This will show which optimization techniques are enabled by -O1
.
For clang
, you can see the sub-steps (pass) during the compilation process using -ftime-report
:
clang -S -emit-llvm -O1 a.c -ftime-report
Of course, you are not required to understand the specific details of every optimization technique. If you are interested, you can refer to the relevant manuals.
Target Code Generation
The task of target code generation is to translate the optimized intermediate code into target code, which corresponds to the processor-specific ISA.
Similarly, we can understand the work of target code generation from a state machine perspective:
Translate the variables in the intermediate code into variables in the processor ISA, i.e., translate %1
, %2
, %3
, etc., into registers or memory addresses. Translate the instructions in the intermediate code into instructions in the processor ISA, i.e., translate instructions like alloca
, store
, load
, add
, call
, ret
, etc., into corresponding processor ISA instructions.
You can view the target code generated by clang
using the following commands:
clang -S a.c
cat a.s
The above clang
command will generate assembly code that matches the local environment’s ISA by default, such as x86. You can also provide the compilation option --target=xxx
to clang
to generate the corresponding assembly code.
For example, using --target=riscv64-linux-gnu
will instruct clang
to generate riscv64 assembly code:
clang -S a.c --target=riscv64-linux-gnu
This process of generating assembly code for an ISA that differs from the local environment is known as "cross-compilation".
Using riscv64 as an example
Due to the lack of a runtime environment for riscv32 in the system, it is not possible to generate riscv32 code with the system’s default configuration, and subsequent steps cannot generate an executable file for riscv32.
For the sake of example, we will use riscv64 to demonstrate the RISC-V related compilation process. Compared to riscv32, riscv64 simply adds several additional instructions. However, to understand the general correspondence between C code and assembly code, you do not need to delve deeply into the specific semantics of each instruction.
Understand the association between C code and riscv instruction sequences
Read the riscv64 assembly code obtained from cross-compiling with clang
, and try to identify which segment of assembly code was compiled from which segment of C code.
Understand the association between C code and riscv instruction sequences (2)
Add -O1
and recompile to obtain riscv64 assembly code, what differences do you notice in the generated assembly code? How does it relate to the C code?
Target code generation, as the final step in the compilation process, can also yield corresponding results using gcc
:
gcc -S a.c # compile to local ISA
riscv64-linux-gnu-gcc -S a.c # cross-compile to riscv64
During the process of generating target code, the compiler will also perform optimizations related to the target ISA. For example, when translating variables in the intermediate code into ISA registers or memory, the compiler will try to place frequently used variables into registers and less frequently used variables into memory.
In modern computers, the efficiency of a processor accessing registers is significantly higher than that of accessing memory, so placing commonly used variables in registers can enhance overall program performance.
For instruction translation, the compiler will also attempt to generate sequences of instructions with fewer instructions, which can similarly improve program performance. These strategies are supported by certain theoretical foundations, which we will not delve into here. Interested readers may consider referring to materials related to compiler principles.
Generation and Execution of Binary Files
Assembly
The result of compilation is assembly code. We already know that assembly language is a symbolic representation of instructions; thus, assembly code is essentially readable text. However, the processor’s circuitry cannot understand text, so the assembly code needs to be converted into binary encoding of instructions. This is the job of the assembling step, and the tool that performs this task is called an assembler.
The assembler operates straightforwardly; overall, it consults the ISA manual and translates each text instruction in the assembly code into the corresponding binary encoding to produce an object file. You can use the following command to have clang
generate the object file after assembly:
clang -c a.c
ls a.o
However, the contents included in the object file are no longer text, so opening it with a text editor will yield unreadable content. To view the content of the object file, we need some binary file parsing tools to help us decode the binary content into readable text.
For example, we can use the objdump
tool from the binutils
(Binary Utilities) package to parse the object file:
objdump -d a.o
The process of re-parsing assembly code from the object file is called “disassembly”. The above command will yield the x86 disassembly code, which is similar to the assembly file generated by the compiler.
To obtain the riscv64 object file, we need to perform cross-compilation:
clang -c a.c --target=riscv64-linux-gnu
riscv64-linux-gnu-objdump -d a.o
Similarly, we can also use gcc
to generate the object file:
gcc -c a.c
riscv64-linux-gnu-gcc -c a.c
or use the LLVM toolchain for disassembly, which can automatically identify the ISA architecture corresponding to the object file, making it more convenient:
llvm-objdump -d a.o # x86和RISC-V的目标文件均支持
View the disassembly results of the riscv64 object file
Based on the above commands, check the disassembly results of the riscv64
object file, and compare them with the assembly file generated by the compiler.
Linking
The linking process merges multiple object files into the final executable file. You can use the following command to have clang
generate the linked executable file:
clang a.c
ls a.out
The executable file can also be disassembled using objdump
.
However, you will find that the linked executable file contains significantly more content compared to the object file (a.o
). To further understand where this content comes from, we can look at the log of the clang
command execution:
clang a.c --verbose
At the end of the log, you can see commands related to linking, which also include several target files named like crt*.o
. Here, crt
stands for “C runtime”, which represents the runtime environment for C programs. In other words, the linking process combines the object files obtained from compiling and assembling a.c
with the existing C runtime environment-related object files to ultimately generate the executable file. It is evident that the executable file needs to include these runtime environment-related object files to provide necessary support for the execution of the executable file.
Similarly, we can also use gcc
to generate the executable file:
gcc a.c
We can also generate the riscv64
executable file through cross-compilation:
clang a.c --target=riscv64-linux-gnu
riscv64-linux-gnu-gcc a.c
View the disassembly results of the riscv64 executable file
Try to generate the riscv64
executable file, view its disassembly results, and compare them with the object file before linking.
The linking process involves many details, which need not be delved into at this time; we will elaborate on this in the D phase.
Execution
For x86, after compiling the executable file, execute it using the following command:
./a.out
You are already familiar with the process of the processor executing instructions: fetching, decoding, executing, and updating the PC. As long as the program’s instruction sequence is placed in memory and the PC points to the first instruction, the processor will automatically execute this program.
Compare the performance differences before and after compilation optimization
We previously introduced various compilation optimization options, and now you can experience the power of these options. Using the previously discussed program for summing a series, you can measure the runtime of the program under different compilation optimization levels to understand the impact of different optimization levels on program performance.
However, since the previous series summation program ends at 10
, to conveniently measure performance differences, you need to adjust the program’s endpoint to increase its execution time. If the endpoint is large, you can also change the type of the sum
variable to long long
. You can measure the execution time of a command using the time
command; f or example, time ls
will report the execution time of the ls
command.
After adjusting the endpoint of the series, compile and measure the program’s runtime under -O0
, -O1
, and -O2
.
If you are interested, you can also disassemble to view the corresponding assembly code and try to understand based on the assembly code: why you achieve the corresponding performance improvement? What compilation optimization techniques might the compiler have applied? However, to answer these questions, you might need to RTFM or STFW to learn about the functions of some assembly instructions.
However, the generated executable file resides on external storage (disk or SSD); how do we place it in memory? Recalling the processor we implemented using Logisim, we directly connected the instruction sequence as constants to the circuit or used the GUI’s Load Image
operation to load the program’s instruction sequence into the ROM
component. Both methods essentially involve us manually completing the task of “placing the program’s instruction sequence in memory”, but in the command ./a.out
, we did not manually perform this task.
When we execute ./a.out
, who exactly completes the task of “placing the program’s instruction sequence in memory” so that the processor can fetch the program’s instruction sequence from memory for execution? In fact, there is a special program in modern operating systems called a “loader”, whose job is to read (load) other executable files from external storage into memory and jump to the corresponding program entry point to execute instructions.
However, the processor can only execute instructions that comply with its ISA specifications. For example, most students use computers equipped with x86 processors, so they can only execute x86 executable files and cannot execute RISC-V executable files; if an x86 processor is forced to execute a sequence of RISC-V instructions, it may misinterpret RISC-V instructions as other x86 instructions during the decoding process, or even recognize them as illegal instructions, causing the processor to fail to execute according to the original semantics of RISC-V instructions, resulting in unexpected program execution results. Therefore, the loader checks which ISA the program’s instruction sequence belongs to during the loading process, and if it does not match the current processor’s ISA, the loader will stop loading and report an error.
Loading the program is a necessary step before the program runs; therefore, the loader is part of the runtime environment. The aforementioned crtxxx.o
target files contain some functionalities of the loader. The runtime environment also includes other functions, for instance, during the execution of ./a.out
, the runtime environment’s role is also reflected in:
- Before the program execution begins, preparing various initialization tasks. When learning C language previously, you might have thought that a program starts executing from
main()
. However, if that were the case, how would the program parameters we input from the command line be passed tomain()
’sargc
andargv
? In fact, this task is also completed by the runtime environment: it callsmain()
only after completing a series of preparatory tasks like loading the program and preparing parameters.
- During the program execution, providing support for library functions like
printf()
. In the example code above,printf()
was called directly, and no code forprintf()
was written, but when executing./a.out
, information was successfully output throughprintf()
. Therefore, we have reason to suspect that thecrtxxx.o
target files provide a way to executeprintf()
either directly or indirectly. In fact, library functions are indeed part of the runtime environment.
- After the program execution ends, providing a function for program exit. When learning C language previously, you might have thought that the program directly exits after returning from
main()
. However, if you can understand that the runtime environment did a lot of preparatory work before executingmain()
, you can easily guess that after returning frommain()
, it should return to the runtime environment, which will then carry out some cleanup tasks before the program exits.
In general, just having the program itself is not sufficient to run it. From a broader perspective, all functionalities that support program execution fall under the category of the runtime environment.
Manual Behavior and Coding Standards
Just as the ISA manual defines the semantics of instructions, the semantics of the C language is also defined by its corresponding manual. After its introduction in the 1970s, C language gained widespread use due to its efficiency, flexibility, and portability. However, various variants and extensions of C across different compilers have led to compatibility issues. To address this challenge, the ANSI established a committee in 1983 to standardize the C language. Over the years, the C language standard has evolved through various versions, including C90, C99, C11, C17, and C23. These versions are typically named based on the year of their release, for example, the C11 standard was published around 2011. In OSOC, we will take C99 as an example to understand some definitions and concepts in the C language standard, thus establishing a more comprehensive understanding of C program behavior.
Semantics of Program Execution
Reviewing the state machine model of computer systems, the process of executing a C program is the process of changing the program’s state through the statements of the C program. This understanding is based on the state machine model. Now we can look at how the C language standard defines “program execution”, which will help us further understand the details of “program execution”.
Section 5.1.2.3 of the C99 manual defines “program execution”, and we will explain these definitions one by one:
1 The semantic descriptions in this International Standard describe the behavior of
an abstract machine in which issues of optimization are irrelevant.
In the manual, the semantic description of program execution is related to an abstract machine, and topics of optimization are not involved. Here, the concept of an abstract machine is similar to the model machine discussed earlier when introducing ISA; it discusses only the functions and behaviors it possesses without addressing its specific implementation.
2 Accessing a volatile object, modifying an object, modifying a file, or calling a
function that does any of those operations are all side effects, which are changes
in the state of the execution environment. Evaluation of an expression in general
includes both value computations and initiation of side effects. Value computation
for an lvalue expression includes determining the identity of the designated object.
Accessing a volatile
object, modifying an object, modifying a file, or calling a function that includes any of these operations are referred to as “side effects”,
meaning changes in the state of the execution environment. The evaluation of an expression generally includes both value computation and the introduction of side effects. The computation for an lvalue expression also includes determining the identity of the target object.
Regarding “access” and “modify”, their concepts are defined in Chapter 3 of the manual, Terms, definitions, and symbols
. Specifically, “access” refers to “reading” or “modifying” the value of an object, where “modify” includes situations where the new value to be stored is the same as the old value.
3 Sequenced before is an asymmetric, transitive, pair-wise relation between
evaluations executed by a single thread, which induces a partial order among those
evaluations. Given any two evaluations A and B, if A is sequenced before B, then the
execution of A shall precede the execution of B. (Conversely, if A is sequenced
before B, then B is sequenced after A.) If A is not sequenced before or after B,
then A and B are unsequenced. Evaluations A and B are indeterminately sequenced when
A is sequenced either before or after B, but it is unspecified which. The presence
of a sequence point between the evaluation of expressions A and B implies that every
value computation and side effect associated with A is sequenced before every value
computation and side effect associated with B. (A summary of the sequence points is
given in annex C.)
- “Sequenced before” defines an asymmetric and transitive binary relation for evaluations executed by a single thread, inducing a partial order among those evaluations.
- Given any two evaluations
A
andB
, ifA
is sequenced beforeB
, then the execution ofA
occurs before that ofB
. - Conversely, if
A
is sequenced beforeB
, thenB
is sequenced afterA
. - If
A
is neither sequenced before nor afterB
, thenA
andB
are unsequenced. - If
A
is sequenced beforeB
, orA
is sequenced afterB
, but it is not specified which, thenA
andB
are indeterminately sequenced. - If there is a sequence point between the evaluations of expressions A and B, then all value computations and side effects associated with A must occur before those associated with B.
- Annex C provides a summary of sequence points.
This entry strictly defines the order relations between different evaluation operations through the concept of sequence points, thereby rigorously defining the semantics of the entire program execution. For example, consider the following program:
a = 1;
b = a + 2;
According to the summary of sequence points in Annex C, there exists a sequence point between the expression evaluations of statement expressions. According to the definition of sequence points, this means that all value computations and side effects related to the expression a = 1
must occur and take effect before evaluating b = a + 2
. Therefore, when evaluating b = a + 2
, the value of a
has already been modified to 1
, allowing the program to read 1
from variable a
at that moment.
You might feel this is of little use. Let’s look at the following example and see what the output of this program is:
#include <stdio.h>
int f() { printf("in f()\n"); return 1; }
int g() { printf("in g()\n"); return 2; }
int h() { printf("in h()\n"); return 3; }
int main () {
int result = f() + g() * h();
return 0;
}
In fact, this program can output any order of function calls because, within the same expression statement, multiple function calls are indeterminately sequenced, allowing them to be called in any order. If the behavior of the program depends on a specific calling order, the execution result may not meet expectations.
4 In the abstract machine, all expressions are evaluated as specified by the
semantics. An actual implementation need not evaluate part of an expression if it
can deduce that its value is not used and that no needed side effects are produced
(including any caused by calling a function or accessing a volatile object).
In the abstract machine, all expressions are evaluated according to the corresponding semantics. In a specific implementation, if the value of a part of an expression is not used and no side effects are produced (including those caused by function calls and accessing volatile
objects), that part of the expression may not be evaluated.
This entry actually indicates the optimization space in the expression evaluation process.
5 When the processing of the abstract machine is interrupted by receipt of a signal,
the values of objects that are neither lock-free atomic objects nor of type volatile
sig_atomic_t are unspecified, as is the state of the floating-point environment. The
value of any object modified by the handler that is neither a lock-free atomic
object nor of type volatile sig_atomic_t becomes indeterminate when the handler
exits, as does the state of the floating-point environment if it is modified by the
handler and not restored to its original state.
This entry is related to signal mechanisms and is beyond the current learning scope, so it will not be elaborated here.
6 The least requirements on a conforming implementation are:
— Accesses to volatile objects are evaluated strictly according to the rules of the
abstract machine.
— At program termination, all data written into files shall be identical to the
result that execution of the program according to the abstract semantics would
have produced.
— The input and output dynamics of interactive devices shall take place as specified
in 7.21.3. The intent of these requirements is that unbuffered or line-buffered
output appear as soon as possible, to ensure that prompting messages actually
appear prior to a program waiting for input.
This is the observable behavior of the program.
A conforming implementation must meet at least three requirements, which aligns with the “consistency of observable behavior of programs” discussed earlier regarding compiler optimization.
7 What constitutes an interactive device is implementation-defined.
What exactly constitutes an interactive device is defined by the implementation.
8 More stringent correspondences between abstract and actual semantics may be
defined by each implementation.
Each implementation may further define a stricter correspondence between abstract semantics and actual semantics.
Implementation of Standard Specifications
The essence of standard specifications is a set of definitions and conventions, usually presented through manuals. To implement the standard specifications, they must be realized in some form, referred to as the "implementation" of standard specifications. For example, the ISA is essentially a standard specification, and processors that implement this specification through digital circuits are one form of ISA implementation. Clearly, the behavior of the processor must conform to the ISA standard specification.
Similarly, the C language standard also has corresponding implementations. Section 3.12 of the C99 manual defines “what is an implementation”:
particular set of software, running in a particular translation environment
under particular control options, that performs translation of programs for,
and supports execution of functions in a particular execution environment
In other words, the implementation of the C language standard is a series of specific software that runs in a particular translation environment, used for translating programs and supporting the execution of corresponding functions in a specific execution environment. If we use concepts we are familiar with to understand, the implementation of the C language standard is the compiler (used for program translation) and the runtime environment (used to support program execution). Similarly, both the compiler and runtime environment must conform to the C language standard.
Standard specifications define many details, including various “should” or “should not” behaviors, which are clearly defined behaviors, and specific implementations must follow these standard specifications. However, not all behaviors can be clearly defined by the standard specifications. Understanding how specific implementations handle these behaviors will help us understand how computer systems work and guide us in writing more standardized code to avoid these behaviors.
以C99标准为例, 无法明确定义的行为有以下几类, 它们的概念均在手册第3章Terms, definitions, and symbols
中定义. Taking the C99 standard as an example, the types of behavior that cannot be clearly defined include the following categories, all of which are defined in Chapter 3 of the manual, Terms, definitions, and symbols
.
Unspecified Behavior
The C99 manual defines Unspecified Behavior as follows:
use of an unspecified value, or other behavior where this International
Standard provides two or more possibilities and imposes no further requirements
on which is chosen in any instance
In other words, for this type of behavior, the C language standard provides multiple choices for the result but does not specify which one to choose; the specific implementation can choose one from them.
The C99 manual provides the following example:
An example of unspecified behavior is the order in which the arguments to a
function are evaluated
That is, the order of evaluation of function call parameters is unspecified. We can verify this with the following program:
// a.c
#include <stdio.h>
void f(int x, int y) {
printf("x = %d, y = %d\n", x, y);
}
int main() {
int i = 1;
f(i ++, i ++);
return 0;
}
In yzh’s system, the program was compiled and run using gcc
and clang
, yielding the following results:
$ gcc a.c && ./a.out
x = 2, y = 1
$ clang a.c && ./a.out
x = 1, y = 2
As we can see, even with the same program code, compiling with different compilers yields different results.
Experience Unspecified Behavior
Try compiling and running the above program on your system and observe the results.
If you try to compile and run the above program on your system, you may get results different from yzh. If this is the case, it indicates that even with the same compiler, using different versions to compile the program can yield different results, but this still conforms to the C language standard. In an extreme case, a compiler could randomly decide the order of evaluation for function call parameters, which would still conform to the C language standard!
if (rand() & 1) { evaluate from left to right; }
else { evaluate from right to left; }
Since neither the C language standard nor its implementation is wrong, the problem can only lie within the program. In fact, the behavior of the aforementioned program relies on the unspecified behavior in the C language standard, resulting in different performances across different compilers. To avoid such outcomes, we should write code that is not affected by unspecified behavior. For example, rewriting the above code as follows:
int i = 1;
int x = i ++;
int y = i ++;
f(x, y);
This way, regardless of the evaluation order of f(x, y)
, the program’s behavior will always output x = 1, y = 2
, thus avoiding the effects of unspecified behavior.
Try understanding parameter evaluation order from the perspective of sequence points
Consult the C99 manual regarding parameter evaluation, and see how the manual describes the order of parameter evaluation through the perspective of sequence points.
Implementation-defined Behavior
The C99 manual defines Implementation-defined Behavior as follows:
unspecified behavior where each implementation documents how the choice is made
This type of behavior is a special kind of unspecified behavior, but specific implementations must document how the choice is made. Unlike general unspecified behavior, once a specific implementation documents its choices for implementation-defined behavior, it cannot be modified. Specific implementations must not only conform to the C language standard but also to their own documentation. Therefore, programs containing such behavior will yield the same results when compiled and executed multiple times in a specific environment (including the compiler and runtime environment).
A common example is the length of integer types. In fact, the C language standard has never defined how long each integer type is. Section 5.2.4.2 of the C99 manual states the following regarding integer type lengths:
An implementation is required to document all the limits specified in this
subclause, which are specified in the headers <limits.h> and <float.h>.
Additional limits are specified in <stdint.h>.
Regarding the lengths of integer types, the C99 manual’s subsection 5.2.4.2.1 states:
The values given below shall be replaced by constant expressions suitable for
use in #if preprocessing directives. Moreover, except for CHAR_BIT and
MB_LEN_MAX, the following shall be replaced by expressions that have the same
type as would an expression that is an object of the corresponding type
converted according to the integer promotions. Their implementation-defined
values shall be equal or greater in magnitude (absolute value) to those shown,
with the same sign.
Here, we list some of the values mentioned above:
Value | Description | |
---|---|---|
SCHAR_MIN | Minimum value of signed char | |
SCHAR_MAX | Maximum value of signed char | |
UCHAR_MAX | Maximum value of unsigned char | |
SHRT_MIN | Minimum value of short int | |
SHRT_MAX | Maximum value of short int | |
USHRT_MAX | Maximum value of unsigned short int | |
INT_MIN | Minimum value of int | |
INT_MAX | Maximum value of int | |
UINT_MAX | Maximum value of unsigned int | |
LONG_MIN | Minimum value of long int | |
LONG_MAX | Maximum value of long int | |
ULONG_MAX | Maximum value of unsigned long int | |
LLONG_MIN | Minimum value of long long int | |
LLONG_MAX | Maximum value of long long int | |
ULLONG_MAX | Maximum value of unsigned long long int |
As we can see, the C language standard only defines the minimum range of values for each integer type, and specific implementations can define a larger range for each integer type but cannot define a range smaller than the minimum defined by the C language standard.
Why does the C language standard specify this? We can understand the objectives of the C language standard from the summary section of the C99 manual:
This International Standard specifies the form and establishes the
interpretation of programs expressed in the programming language C. Its purpose
is to promote portability, reliability, maintainability, and efficient
execution of C language programs on a variety of computing systems.
To maximize compatibility across various computer systems, the C language standard must consider:
- Supporting older computer systems, so many provisions cannot be overly rigid. You may have noticed that in the C language standard, the maximum value for
signed char minimum value
is , but the minimum number representable by an 8-bit two’s complement is . This is because the C language standard considers that some older computers used sign-magnitude or one’s complement representation, where the minimum representable number by an 8-bit signed number is . If the C language standard had defined the maximum value forsigned char minimum value
as , it would not be compatible with those computers.
How long is one byte?
You might answer without thinking, “8 bits”. However, in fact, the C language standard’s section 3.6 states that the number of bits in 1 byte is "implementation-defined" :
A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined
This is clearly to be compatible with older computer systems: Historically, the length of 1 byte has been defined as different numbers of bits in different computers, ranging from 1 bit to 48 bits. Therefore, the C language standard cannot directly define 1 byte
as 8 bits
, leaving it to specific implementations to define.
You may have also noticed that the content quoted from the C language standard uses ranges of values to indirectly describe the length of integer types instead of directly describing them as 4 bytes
, which is also to avoid using the ambiguous concept of byte
that varies across different implementations.
- Supporting future computer systems, so many provisions must be compatible with future developments. The C language standard only defines the minimum range of values for each integer type, which allows for future flexibility. For example, regarding the
int
type, in some environments from the 1990s (like Turbo C, used in some old textbooks),int
was 16 bits; in 32-bit systems (like Windows VC 6.0 or Linux GCC),int
is 32 bits; in modern 64-bit systems,int
is still 32 bits.
Check the range of integer types in Linux
The Linux runtime environment is also part of the specific implementation of the C language standard. You can check /usr/include/limits.h
to understand the range of integer types in this specific implementation.
In summary, programs containing implementation-defined behaviors can yield the same results when compiled and executed multiple times in a specific environment. However, when porting the program to another environment, one must consider the differences in implementation-defined behavior between the new and old environments. For instance, if one were to port a 32-bit program to a 16-bit environment (though such needs are rare today), one consideration would be that int
type data might overflow, and one solution could be to change it to long
type, as the C language standard specifies that the range for long
type is at least , which can accommodate the range of int
type in the 32-bit environment.
Locale-specific Behavior
The C99 manual defines Locale-specific Behavior as follows:
behavior that depends on local conventions of nationality, culture, and
language that each implementation documents
This means that the results of this type of behavior depend on local customs regarding nationality, culture, and language, and specific implementations must document the results of these behaviors. This is a special kind of implementation-defined behavior.
An example is the extended character set. What characters are included in the extended character set is a locale-specific behavior. Consider the following program:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
#define Main main
#define Return return
char* string_concat(char *str1, char *str2) {
char *new_str = malloc(strlen(str1) + strlen(str2) + 1);
assert(new_str);
strcpy(new_srt, str1);
strcat(new_str, str2);
Return new_str;
}
int Main() {
char *info = string_concat("One Student One Chip", "very simple");
printf("%s\n", info);
free(info);
Return 0;
}
The above program uses strings and identifiers that include Chinese characters. In specific implementations that support Chinese characters, this program can compile and run successfully; however, in implementations that only support the basic character set (i.e., ASCII), the above program will fail to compile.
Typically, when developing internationalized software (i18n), one must consider locale-specific behaviors. Besides character sets, locale-specific behaviors also include decimal point characters, currency symbols, formats for time and date, etc. However, since OSOC does not involve such software, you do not need to explore this deeply.
Undefined Behavior
The C99 manual defines Undefined Behavior as follows:
behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this International Standard imposes no requirements
This is a type of erroneous behavior where the program or data does not conform to the standard, but the C language standard imposes no constraints on the result of such behavior. In other words, regardless of the outcome, it conforms to the C language standard. In simple terms, “anything can happen”.
The C99 manual lists some possible outcomes:
Possible undefined behavior ranges from ignoring the situation completely with
unpredictable results, to behaving during translation or program execution in a
documented manner characteristic of the environment (with or without the
issuance of a diagnostic message), to terminating a translation or execution
(with the issuance of a diagnostic message).
These outcomes include:
- Errors and exits during compilation or execution
- Handling according to the specific implementation’s documentation, possibly without warnings
- Completely unpredictable results
An example of undefined behavior is a buffer overflow. Consider the following program:
// a.c
#include <stdio.h>
int main() {
int a[10] = {0};
printf("a[10] = %d\n", a[10]);
return 0;
}
Experience Undefined Behavior
Try compiling and running the above program on your system multiple times and observe the results.
In the above program, when calling printf()
, a[10]
is referenced, accessing an element that has exceeded the bounds of the array. According to the C language standard, the value of a[10]
is undefined. Therefore, whether the compiler throws an error, the program runs with an error, or the program outputs 0
or garbage values, all of these behaviors conform to the C language standard.
Programs containing undefined behavior are unlikely to yield correct results even when compiled and run multiple times. If your program produces different results across multiple runs, after excluding external random factors, you may suspect that your program has undefined behavior.
Why RTFM
The reason we quoted the original text from the C language standard above is to let everyone see the precise definitions of some concepts in the C language standard. You are unlikely to find similar concepts and their definitions in other materials. For instance, even if you have attended a C language course or learned C from other materials, you might be hearing about concepts like “undefined behavior” and “sequence points” for the first time.
This indicates that the C language textbooks and learning materials you have encountered are not the entirety of the C language. In fact, the C language standard precisely defines every detail of the C language, and we have reason to believe that, regarding understanding the C language, the creators of the C language standard have deeper insights than authors of books and blogs.
Of course, different individuals have different goals in learning the C language. If your goal is simply to write some basic programs in C or to pass exams, most textbooks and learning materials will suffice. On that basis, if you wish to delve deeper into learning C, or understand the fundamental principles of how computer systems work, or even build your own computer system, reading the C language standard manual will be your best choice.
The goal of OSOC is the latter, because processor chips differ from other chips in that their design goal is to run software programs. Therefore, we view processor chip design from the perspective of computer systems: not only do you need to learn to design a processor chip, but you also need to learn how to run programs on this processor chip to check if the programs run correctly and efficiently.
As your learning deepens, you will gradually encounter some questions that books and blogs cannot clarify. At that point, you need to realize that you have begun to enter the realm of professionals. Learning to read manuals is a passport to becoming a professional.
However, the content in manuals is extensive, and our suggestion for you to read the manuals is not to expect you to grasp all of it in a short time, but more importantly, to cultivate the awareness of reading manuals: when you want to thoroughly understand a problem, you should think of reading related content in the manual; even if you can find relevant answers online, you should consider how the manual describes it, unless those answers explicitly quote the manual (such answers are excellent!). If you can establish this awareness and put it into practice, your understanding of C language will already be in the top 1% of the industry.
Reference list of the aforementioned unspecified behaviors
Annex J of the C99 manual lists all the contents of the aforementioned four types of behaviors in the C language standard, which can be consulted if necessary.
Application Binary Interface (ABI)
We have learned that the specific implementation of the C language standard consists of a series of software, mainly compilers and runtime environments, where the compiler is responsible for generating the binary executable files of programs, and the runtime environment is responsible for supporting program execution. On one hand, the binary executable files of programs relate to the ISA, as programs are executed on the processors corresponding to the ISA; on the other hand, the runtime environment contains library functions and some functionalities of the operating system, which the program needs to interact with before, during, and after execution.
Thus, concepts like programs, compilers, operating systems, library functions, and ISA are interconnected as a whole within a computer system. This connection is generally presented through agreements and specifications, which is the Application Binary Interface (ABI), meaning that ABI is the interface specification between a program at the binary level and the aforementioned concepts.
Due to its need to be compatible with various computer systems, the C language standard cannot precisely define the results of many behaviors. However, for a specific computer system, many conditions are fixed. As a set of agreements at the binary level for this computer system, ABI can be viewed as a document of a concrete implementation of the C language standard. Therefore, many choices regarding implementation-defined behavior in the C language standard will be documented within the ABI.
For example, for a specific ISA, the bit width of bytes and general registers is determined. Based on this, ABI can determine the range of values for integer types as defined in the C language standard, and the understanding of integer type ranges by the entire computer system will follow the ABI, thereby collectively supporting the execution of programs based on a consistent understanding.
As a specification, ABI includes:
- Instruction sets of processors, register structures, stack organization, memory access types, etc.
- Sizes, layouts, and alignment of basic data types that can be directly accessed by the processor
- Calling conventions that specify how function parameters are passed and how return values are obtained
- How applications initiate system calls to the operating system
- Formats of object files, supported runtime libraries, etc.
We see again that the execution results of programs on a specific computer system are related to source code, compilers, runtime environments, ISA, hardware, etc.; and ABI is also an important reflection of the collaboration between software and hardware in a computer system, jointly supporting program execution. Currently, you only need to have a basic understanding of ABI. We will expand on the specific contents of ABI with RISC-V as an example in Phase D.
RTFM
Of course, not only the C language standard manual, but we also recommend that everyone read other manuals in their leisure time, including the RISC-V manual, Verilog manual, and RISC-V ABI manual, as they can provide the most comprehensive and authoritative help for understanding related details. If you find reading them very challenging, don’t worry; as you accumulate more foundational knowledge with deeper learning, going back to read them will become smoother.
Instruction Set Simulator - A Program That Executes Programs
We have introduced the steps from C code to binary programs. To further understand the essence of program execution, we attempt to implement the process of program execution using the C language. Such a program used to execute other programs is called a "simulator". In the industrial design process of processors, simulators play a very important role. In the future, you will also use simulators to help you complete some important tasks in the processor design process, but for now, let’s focus on how to implement a simple simulator.
You have already tried to design a CPU in Logisim, where the process involves using a digital circuit state machine to implement the ISA state machine. To implement a simulator in C, we need to consider how to use the state machine of the C program to realize the ISA state machine. Therefore, we will first review the C program and ISA from the perspective of state machines:
C Program | ISA | |
---|---|---|
State | ||
Event | Execute Statement | Execute Instruction |
State Transition Rule | Statement Semantics | Instruction Semantics |
To use the C program’s state machine to implement the ISA state machine, we need to develop a C program that includes the following functionalities:
- Use the C program’s state to represent the ISA’s state, which means using the C program’s variables to represent the ISA’s PC, GPR, and memory.
- Use the C program’s state transition rules to implement the ISA’s state transition rules, which means using C language statements to implement the semantics of instructions.
A simulator that only implements instruction behavior from the ISA perspective is called an instruction set simulator. Similarly, there are architecture simulators and circuit simulators, but we will not touch on those for now.
Basic Implementation of sEMU
Let’s illustrate how to implement an instruction set simulator using sISA as an example, referring to the corresponding instruction set simulator as sEMU (simple EMUlator). For details about sISA, please refer to previous lectures.
sEMU needs to use C program variables to represent the ISA’s PC, GPR, and memory, which is not difficult:
#include <stdint.h>
uint8_t PC = 0;
uint8_t R[4];
uint8_t M[16];
Here, although the width of PC in the ISA layer is 4 bits, there is no 4-bit basic data type in C, so we use uint8_t
here. Moreover, since the PC width is only 4 bits, it indicates that sISA programs can contain at most 16 instructions, and therefore the size of the array M
used to implement memory only needs to be set to 16
.
sEMU needs to use C language statements to implement the semantics of instructions, which is also not difficult. From the perspective of the instruction cycle, we need to write a function inst_cycle()
to achieve the following functionalities:
- Fetch - Directly index memory
M
based onPC
to fetch an instruction. - Decode - Use C language bitwise operations to extract the
opcode
field of the instruction and check which instruction it belongs to; then extract the operand fields based on the instruction format and obtain the corresponding operands. - Execute - If the executed instruction is not
bner0
, write the result back to the destination register; otherwise, decide whether to jump based on the condition. - Update PC - If not jumping, increment
PC
by 1.
After implementing inst_cycle()
, we just need to keep calling it in sEMU:
while (1) { inst_cycle(); }
The above has considered all the details of sISA, but to run a program in sEMU, we need to implement not only the ISA-level functionalities but also consider how to provide the corresponding runtime environment for the program. However, since the functionality of sISA is limited, it can only support the execution of a few programs. Therefore, we will discuss what support the runtime environment should provide, assuming it only runs the summation program. We will consider it from three aspects:
- Before program execution begins: One question to consider is how to load the program. However, since sEMU only intends to run the summation program, we can directly initialize
M
:Since the summation program does not currently involve parameters during its execution, we can temporarily ignore the transmission of program parameters in the runtime environment.uint8_t M[16] = { ... };
- During program execution: Since the summation program does not currently involve calling library functions, we can temporarily ignore the support for library functions in the runtime environment.
- After program execution ends: Since the summation program indicates program completion through an infinite loop, which has already been implemented through the
bner0
instruction at the instruction level, there is no need to provide additional functionality for program exit.
Implement sEMU
Based on the above ideas, implement sEMU in C code and run the previous summation program. Since the summation program itself will not end, you can modify the loop condition in the while
statement to exit after a certain number of iterations, and then check if the summation result meets expectations.
Reinforcing the Runtime Environment
It can be seen that this runtime environment is much simpler than the one discussed in ./a.out
. In fact, the C language standard defines two types of runtime environments:
one is called a “freestanding environment”, which typically has no operating system support, and programs usually run directly on bare-metal; the runtime environment discussed earlier for sEMU belongs to the freestanding environment.
The other is called a “hosted environment”, which refers to the situation where an operating system is running, and programs typically run on an operating system, supported by relevant components of the operating system; the runtime environment of ./a.out
falls under the hosted environment.
You can understand that if the ISA of the program and the ISA of the processor are inconsistent, the program cannot run. However, even if the ISAs are consistent, the program may still fail to execute successfully; runtime environment considerations are also necessary. Suppose you implemented an x86 instruction set simulator similar to sEMU, but you still cannot run ./a.out
because ./a.out
assumes it runs in the Linux hosted environment and needs the runtime environment to provide a series of functional supports. However, the runtime environment provided by your instruction set simulator belongs to the freestanding environment, which does not meet the needs of ./a.out
. Another example is if you copy a Windows program to Linux, it will also fail to run successfully, although both Windows and Linux are hosted environments, but they provide different specific supports.
From another perspective, in order to make the program’s functionalities richer, we can not only add more instructions but also provide a more powerful runtime environment for the program. Below we attempt to add more functionalities to sEMU.
First, we will add output functionality to sEMU, allowing programs running on it to output an integer. Specifically, you need to add a new instruction out rs
to sISA, which will output R[rs]
to the terminal upon execution. You can decide the encoding of this instruction and then implement it in sEMU.
One issue to consider is how to implement the “output to terminal” functionality, as the concept of “terminal” does not belong to the scope of sISA; precisely, it belongs to the scope of the Linux hosted environment. Note that while sISA programs run in the runtime environment provided by sEMU, sEMU itself also runs in the runtime environment provided by Linux. Therefore, we can utilize the functionalities of the Linux runtime environment to achieve the effect of “output to terminal”. Naturally, this functionality is provided by printf()
, which means sEMU provides runtime environment support for programs running on it by calling printf()
, thus implementing the agreed functionality of the out
instruction. From the perspective of the runtime environment, this functionality belongs to the support during the program execution process.
Implement output functionality
Based on the above, implement the out
instruction in sEMU and modify the summation program so that after calculating the result, it outputs the calculation result using the out
instruction. If your implementation is correct, you should see the terminal display 55
.
Next, we will consider implementing parameter passing for the program. The previous summation program could only calculate 1+2+...+10
. We want to take the last term of the sequence as a parameter, i.e., we want the user to input the parameter n
from the command line, allowing the summation program to calculate 1+2+...+n
. To pass n
to the program, we need to let the runtime environment place n
in a designated location before the program begins execution, so that the program can read n
from that location to perform the calculation. A simple way to do this is to place this parameter in a register, for example, r0
.
Implement parameterized summation
Modify the summation program to obtain the last term of the sequence from r0
. Then modify sEMU to place the user input parameter in r0
. After implementation, conduct simple tests of your implementation, for example, typing ./semu 10
, the program should output 55
; typing ./semu 15
, the program should output 120
. We assume that the user’s input will not cause the calculation process to overflow, so you do not need to consider how to handle result overflow.
After the above enhancements, we can summarize the runtime environment that sEMU provides to the program.
We consider it from three aspects:
- Before the program execution begins, the program’s instruction sequence will be stored in memory. Additionally, the program parameters will be stored in the
r0
register. - During program execution, the value of a specified register can be output as an integer via the
out
instruction. - After program execution ends, the program will enter an infinite loop.
Implementation of minirvEMU
You already know how to implement sEMU; similarly, you can implement an instruction set simulator for the previous minirv, which we will call minirvEMU. For details about minirv, please refer to previous lectures.
Implement minirvEMU for two instructions
Design a minirvEMU that supports addi
and jalr
instructions and allows it to run the previously executed two-instruction test program in Logisim.
Adding the remaining 6 instructions to minirvEMU is not difficult, but running the sum
and mem
programs requires additional considerations. Both of these programs have many instructions, making it inconvenient to manually initialize M
.
To automatically initialize M
, we can have minirvEMU read the program to be executed into M
from a file. For example, for the sum
program, we can use the following command to convert the previous .hex
file into a binary .bin
file:
tail -n +2 sum.hex | sed -e 's/.*: //' -e 's/ /\n/g' | sed -e 's/\(..\)\(..\)\(..\)\(..\)/\4 \3 \2 \1/' | xxd -r -p > sum.bin
After obtaining the .bin
file, you can use file operations in C to read it into M
and execute it.
Implement complete minirvEMU
Add the remaining 6 instructions to minirvEMU and update the program loading method, then run the previously executed sum
and mem
programs in Logisim. To determine if the program has successfully ended, you can refer to the previous judgment method used in Logisim.
Previously, we manually judged whether the programs running in Logisim ended successfully. One reason is that Logisim, as a GUI program, is not very convenient for adding personalized functionalities. In contrast, as a C program, we can attempt to have minirvEMU automatically determine whether the program has successfully ended.
Specifically, we can add an ebreak
instruction and define its functionality as program termination. To implement automatic judgment of the program termination status, we let minirvEMU stop simulation after executing ebreak
.
To ensure the correctness of the program termination status, we expand the functionalities of the runtime environment, agreeing to write an integer representing the termination status into the a0
register before the program ends, and then execute the ebreak
instruction. If this integer is 0
, it indicates that the program ended correctly; if it is not 0
, it indicates that an error occurred.
Based on this agreement, we need to modify the instruction sequence at program termination to comply with this agreement; on the other hand, after minirvEMU executes ebreak
, we can check the value of the a0
register to see if the program ended successfully. For example, if a0
is 0
, output the message HIT GOOD TRAP
, indicating the program ended successfully; if a0
is not 0
, output the message HIT BAD TRAP
, indicating an error occurred.
To modify the program to comply with the above runtime environment agreement, we can have minirvEMU manually write the encoding of the ebreak
instruction into the correct position after reading the program into M
: regarding the specific encoding of the ebreak
instruction, please refer to the relevant documentation; and for the “correct position”, it refers to a location near the halt
function in the program.
For example, you can write the ebreak
instruction as follows:
M[? + 0] = ?;
M[? + 1] = ?;
// ......
You need to find the correct memory position and the encoding of the ebreak
instruction, then replace the above ?
.
Implement automatic judgment of program termination
Based on the above runtime environment agreement, add and implement the ebreak
instruction in minirvEMU, then modify the instruction sequence of the program to execute the ebreak
instruction upon termination. If your implementation is correct, you will see the program automatically terminate and output termination information via minirvEMU.
Of course, this process still involves quite a few manual operations, but currently, the number of programs we need to run is not many, and the overhead of these manual operations is still acceptable. As the number of programs that need to be run increases, we will still need to find a way to achieve fully automated judgment methods. We will continue to discuss this issue in the C phase.
Programs Supporting GUI Input and Output
You have previously run a program that displays images using the minirv processor in Logisim. Now, can you run this program in minirvEMU as well? However, the C programs you developed before can only perform input and output through the terminal. So, how are the GUI programs we commonly use developed? To establish a simple understanding, let’s first let you experience a classic game: Super Mario.
Run Super Mario
Read the introductory part of the lecture in PA1 titled “Before starting the joyful PA journey -> What is NEMU?” and try to run Super Mario according to the instructions given in the lecture. Complete checks for the screen, keys, and sound.
You will surely be curious about how Super Mario runs specifically. In fact, Super Mario itself is a sequence of instructions, but these instructions belong to an instruction set called 6502. This instruction set emerged in the 1980s and is now rarely used, along with its corresponding processors. However, you did not run Super Mario on a real 6502 processor; so how does Super Mario actually run?
What happened is that a simulator program for the 6502 instruction set, called fceux
, was run first. This program behaves in a unique way; it can simulate the execution of 6502 instructions through software, thereby simulating the execution process of the entire Super Mario game!
Experience GUI Input and Output
But first, let’s consider the content related to GUI programs, and we’ll return later to discuss simulators. We introduce a runtime environment called abstract machine
(AM for short), which provides some APIs (Application Programming Interface) in the form of library functions to implement some commonly used input and output functions in GUIs.
The fceux
program achieves the operations and displays of Super Mario by calling these APIs. However, the behavior of fceux
is still too complicated for everyone, so let’s start with some simple example programs to understand the input and output functions used in fceux
.
Experience clock functionality
Run the clock test program using the following command:
cd am-kernels/tests/am-tests
make ARCH=native mainargs=t run
You will find that the program outputs a line of text every second on the terminal. Additionally, the program will pop up a new window that is completely black, but currently, it has no functionality; you don’t need to worry about it for now.
After experiencing the above functionality, try reading am-kernels/tests/am-test/src/tests/rtc.c
to understand how this functionality is implemented. Among them, the code io_read(AM_TIMER_UPTIME).us
will retrieve the time elapsed since the program started, in microseconds(us).
Experience key functionality
Run the key test program using the following command:
cd am-kernels/tests/am-tests
make ARCH=native mainargs=k run
You will find that the program pops up a new window that is completely black. When you press a key in the new window, you will see the corresponding key information output in the terminal, including the key name, keyboard code, and key status. After experiencing the above functionality, try reading am-kernels/tests/am-test/src/tests/keyboard.c
to understand how this functionality is implemented (you can currently ignore the uart
related functionality in the code). Among them, the code io_read(AM_INPUT_KEYBRD)
will obtain a key event ev
, where ev.keycode
indicates the key code and ev.keydown
indicates whether the key is pressed or released. You can check the key code values in abstract-machine/am/include/amdev.h
, which all start with AM_KEY_
, such as the code for the A
key being AM_KEY_A
. Specifically, AM_KEY_NONE
indicates no key event.
Experience display functionality
Run the display test program using the following command:
cd am-kernels/tests/am-tests
make ARCH=native mainargs=v run
You will find that the program pops up a new window and plays an animation.
After experiencing the above functionality, try reading am-kernels/tests/am-test/src/tests/video.c
to understand how this functionality is implemented. Among them, the code io_write(AM_GPU_FBDRAW, x, y, pixels, w, h, true)
indicates drawing a rectangular image of w*h
at the screen coordinates (x, y)
. The image pixels are stored in pixels
in row-major order, with each pixel described by a 32-bit integer in the 00RRGGBB
format.
Design a Simple Screensaver
After understanding how to use GUI input and output in code, we will next implement a simple screensaver to learn how to develop a simple GUI program on AM.
First, let’s try to make the popped-up window display a certain color. We provide the following framework code:
#include <am.h>
#include <klib-macros.h>
void draw(uint32_t color) {
// add code here
}
int main() {
ioe_init(); // initialization for GUI
while (1) {
draw(0x000000ff);
}
return 0;
}
To use the features provided by AM, we need to write a Makefile
that conforms to the AM specification, for example:
NAME = screensaver
SRCS = screensaver.c
include $(AM_HOME)/Makefile
If the filename of your saved C code is not screensaver.c
, you need to modify the above Makefile
according to the actual filename. Then compile and run using the following command:
make ARCH=native run
You should see a completely black new window popped up by the program, because you have not correctly implemented the void draw(uint32_t color)
function yet.
We will introduce detailed information about the Makefile
and AM in Phase D; for now, you only need to know how to use AM’s features to develop, compile, and run the program.
Implement display of a single color
Implement the draw()
function described above, which will fill the popped-up window with the color indicated by the parameter color
, with a resolution of 400x300
.
After implementation, recompile and run. If your implementation is correct, you should see the popped-up window filled with blue.
The desired effect of the screensaver we need to implement is to create a gradient between multiple colors on the screen. The gradient algorithm needs to linearly interpolate colors between the current color and the target color, and display them sequentially to achieve the gradient effect. Assuming the current color is , and after steps, it changes to the target color , where each color component value is in , then the color to be displayed at step () should be:
After one round of gradient ends, choose the next target color for the next round of gradient. The target colors can be randomly selected from the following colors:
0x000000, 0xff0000, 0x00ff00, 0x0000ff, 0xffff00, 0xff00ff, 0x00ffff, 0xffffff
We only need to control the interval of displaying the screen color using clock-related code to achieve the gradient effect.
Implement color gradient effect
Implement the color gradient effect as described above. You can decide how long each gradient round lasts and how many steps there are in one round of gradient.
Finally, let’s make use of the key functionality. Specifically, we hope to achieve the following key functionalities:
- If the
ESC
key (key codeAM_KEY_ESCAPE
) is pressed, the program will exit. - If any other key is pressed, it will speed up the time of one round of gradient; after releasing the key, the gradient time will return to normal.
Add key effects
Add key effects to the screensaver program as described above.
How screensavers protect screens
In the past, people used CRT displays, which work by an electron gun firing high-speed electrons at the phosphor layer of the screen to emit light and display images. However, if the screen displays the same image for a long time, the electron beam will bombard the same spot on the phosphor layer for too long, causing accelerated aging of that spot, or even penetrating the picture tube, damaging the screen. Screensavers prevent damage to the screen by constantly changing the displayed image on the screen, thereby changing the position and intensity of the electron beam’s bombardment.
However, nowadays, people mostly use LCD displays (liquid crystal displays), which operate differently from CRT displays, and past screensavers can no longer serve their protective purpose. The above mandatory tasks are merely a simple exercise and do not consider their protective functionality. In fact, a static screen usually indicates that the user has not operated the computer for a long time; in this case, the best practice is to turn off the display.
After implementing the screensaver, you should have a basic understanding of the GUI program development process. However, we have not further explained how io_read()
and io_write()
actually implement input and output: why calling io_read(AM_INPUT_KEYBRD)
can obtain key information? Why calling io_write(AM_GPU_FBDRAW, x, y, pixels, w, h, true)
can draw pixels on the screen? In real computer systems, this requires coordination among multiple layers of abstraction, including runtime environments, instruction sets, processors, buses, device controllers, and I/O devices. As you progress in your studies, you will learn about each detail in this process in Phase B.
Develop Your Own Mini-Game
If you are interested, you can also use the above APIs to develop your own mini-game! A mini-game is essentially just a loop:
while (1) {
wait_for_a_new_frame();
handle_user_keys();
update_game_logic();
draw_a_new_screen();
}
The screensaver you just developed is fundamentally also such a loop, but its game logic is very simple, consisting only of handling a gradient algorithm. Moreover, am-kernels
also provides some examples of mini-games:
am-kernels/kernels/typing-game
- Typing gameam-kernels/kernels/snake
- Snake game
option:: Develop your own Mini-Game
Try to conceive and design a mini-game. On one hand, this mini-game can run on the processor you design; on the other hand, we are also collecting fun and interesting mini-games for long-term use. If your designed game is excellent enough, it may be included in our maintained game collection for other students to enjoy, and even showcased on real chips.
Some useful resources:
- Displaying characters:
am-kernels/kernels/typing-game/font.c
provides bitmap for characters; please refer to the relevant code to understand how to use it.- Displaying images: You can use the command
convert abc.jpg abc.txt
to convert an image into a text file of enumerated pixels, where each pixel’s color code is listed in order. You can further process this text file with scripts to convert it into a C array.If you plan to submit your designed game to us, please contact the teaching assistant. However, to ensure your game can run in other students’ environments, please additionally ensure the following content:
- Portability:
- Do not use floating-point numbers in the code.
- When needing to perform calculations related to decimals, scale them to integers first, for example, to handle three decimal places, scale it by 1000 before processing.
- When calculating elementary functions like
sin
, approximate them using power series expansion.- When needing a 64-bit integer type, use
uint64_t
orint64_t
.- When using integer types to store pointers, use
uintptr_t
orintptr_t
.- Playability: After entering the game, if the player does not operate for some time, the game will fail.
- Exit Capability: Pressing the
ESC
key callshalt(0)
to exit the program.
Add Graphic Display Functionality to minirvEMU
Since you have designed a screensaver, adding graphic display functionality to minirvEMU should not be difficult. Specifically, you need to define an additional array to store the pixel information the program will write and determine in the implementation of the sw
instruction whether the address falls within [0x20000000, 0x[number_3})
. If so, write this write operation to the above array. After the program finishes running, use the APIs provided by AM to display these pixels on the screen.
Add graphic display functionality
Following the above ideas, add graphic display functionality to minirvEMU, and then run the previously executed vga
program in Logisim.
Some hints:
- To use the features provided by AM, you can refer to the relevant code of the screensaver described above.
- To prevent minirvEMU from exiting immediately after displaying the image, let it enter a loop.
Although the current minirvEMU only runs a program that displays an image, it is not hard to imagine that if it supports more instructions and provides more complete input and output functionality such as clock and keys, you will have a more powerful simulator, similar to fceux
, that could run Super Mario or even games you develop yourself! By Phase D, you will achieve this goal!
Moreover, as a runtime environment, AM allows us to compile programs that can end via the ebreak
instruction without having to manually write the ebreak
instruction at the correct position in minirvEMU. We will elaborate on the details of AM in Phase D.
Finally, you should feel that creating a simulator in C to execute minirv instructions is much simpler than developing a processor in Logisim: we only need to consider how to implement the behavior of instructions in C, without having to consider circuit implementations or wiring details.
In fact, implementing a real processor requires consideration of more factors, including timing, area, and power consumption; you will gain a deeper understanding of this in Phase B. However, from a learning perspective, it is precisely because implementing a simulator does not require consideration of these factors that it becomes a very good way to understand instruction sets and program behavior.
In the processor design process in the industry, simulators also play an important role in functional validation, program analysis, and debugging, and some more detailed architecture simulators are important components in exploring processor design space. As you delve deeper into your studies, you will gradually appreciate the significance of simulators.