User Programs and System Calls

After understanding the execution process of the trap instruction through yield test, we can now comprehend how a simple operating system works.

The Simplest Operating System

Seeing is believing, let's first demonstrate how simple an operating system can actually be. The operating system used in PA is called Nanos-lite, which is a trimmed version of Nanos operating system from Nanjing University. It is an operating system tailor-made for PA. By writing the code for Nanos-lite, you will realize how an operating system uses the mechanisms provided by hardware (i.e., ISA and AM), to support the running of programs, which aligns with the ultimate goal of PA. As for the full version of Nanos, you will get to see its true face in the operating system course next semester.

We have prepared the framework code for Nanos-lite, which can be obtained by executing the following commands:

cd ics2023
bash init.sh nanos-lite

Nanos-lite includes all the modules used in subsequent PAs, but most of the modules' specific functionalities are not implemented. Since the functionality of the hardware (NEMU) is gradually added, Nanos-lite must also match this process, you will control the functionalities of Nanos-lite through some macros related to the experimental progress in nanos-lite/include/common.h. As the experimental progress advances, we will gradually explain all the modules, and the work done by Nanos-lite will increase. Therefore, when reading the code of Nanos-lite, you only need to focus on the modules related to the current progress, and not get entangled in code unrelated to the current progress.

nanos-lite
├── include
│   ├── common.h
│   ├── debug.h
│   ├── fs.h
│   ├── memory.h
│   └── proc.h
├── Makefile
├── README.md
├── resources
│   └── logo.txt    # Project-N logo text
└── src
    ├── device.c    # Device abstraction
    ├── fs.c        # File system
    ├── irq.c       # Interrupt and exception handling
    ├── loader.c    # Loader
    ├── main.c
    ├── mm.c        # Memory management
    ├── proc.c      # Process scheduling
    ├── ramdisk.c   # ramdisk driver
    ├── resources.S # ramdisk content and Project-N logo
    └── syscall.c   # System call handling

It should be noted that Nanos-lite runs on top of AM, and AM's API is available in Nanos-lite. Although the operating system is a special concept for us, in the eyes of AM, it is just a normal C program that calls the AM API, no different from Super Mario. At the same time, you will once again appreciate the benefits of AM: the implementation of Nanos-lite can be architecture-independent, which means that no matter which ISA you previously chose, you can easily run Nanos-lite, even you can debug your written Nanos-lite on native just like developing klib.

Additionally, although it will not cause significant misunderstandings, after introducing Nanos-lite, we will still use the concept of "user process" instead of "user program" in some places. If you cannot understand what a process is right now, you just need to think of a process as a "program that is running." Still can't feel the difference between the two? Here's a simple example, if you open Notepad three times, there will be three Notepad processes running on the computer, but there is only one Notepad program on the disk. The process is an important concept in operating systems, and detailed knowledge about processes will be introduced in the operating systems course.

Initially, all macros related to the experimental progress in nanos-lite/include/common.h are undefined, at this time, the functionality of Nanos-lite is very simple. Let's briefly review the current behavior of Nanos-lite:

Print the Project-N logo, and output hello message and compilation time through Log(). It should be noted that the Log() macro defined in Nanos-lite is not the Log() macro defined in NEMU. Nanos-lite and NEMU are two independent projects, their code will not affect each other, you need to pay attention to this when reading the code. In Nanos-lite, the Log() macro outputs through the printf() you wrote in klib, and will eventually call TRM's putch().
Call init_device() to perform some initialization operations on devices. Currently, init_device() directly calls ioe_init().
Initialize ramdisk. Generally, programs should be stored in permanent storage media (such as disks). However, simulating a disk in NEMU is a somewhat complex task, so Nanos-lite uses a segment of memory as a disk for now. This kind of disk has a special name, called ramdisk.
init_fs() and init_proc(), which are used to initialize the file system and create processes, currently, they do not perform any meaningful operations and can be ignored.
Call panic() to end the running of Nanos-lite.

Since Nanos-lite is essentially an AM program, we can compile/run Nanos-lite in the same way. Execute the following in the nanos-lite/ directory:

make ARCH=$ISA-nemu run

That's it. Additionally, as mentioned earlier, you can also compile Nanos-lite to native and run it to aid in your debugging.

The operating system is a C program

You might find it hard to believe, but the .c and .h files of the framework code undoubtedly contain this ironclad fact, and even the compilation method doesn't show anything special. It's the same with GNU/Linux: if you read its source code, you'll find that GNU/Linux is just a huge C program.

So, compared to a typical C program, what is special about an operating system? After completing PA3, you will have some idea.

The operating system provided by the framework code really does nothing! Don't worry, you need to define the macro HAS_CTE in nanos-lite/include/common.h, after that, Nanos-lite will perform the following additional operations:

Call the init_irq() function during initialization, which will initialize CTE through the cte_init() function
Call yield() before panic() to trigger a trap

Implement Correct Event Dispatching for Nanos-lite

Nanos-lite's event handling callback function does not handle any events by default. You need to identify the EVENT_YIELD trap event in it, and then output a sentence, no other actions are required for now.

Run Nanos-lite again, if your implementation is correct, you will see the information output after the trap event is identified, and the panic() set at the end of the main() function will still be triggered.

After confirming that Nanos-lite can correctly trigger the trap operation, the user program can switch the execution flow to the entry point specified by the operating system. Now let's solve the second problem of implementing a batch system: how to load the user program.

Loading the First User Program

In the operating system, loading user programs is the responsibility of the loader module. We know that a program includes code and data, both of which are stored in the executable file. The loading process involves placing the code and data from the executable file into the correct memory locations, then jumping to the program's entry point, at which the program begins execution. More specifically, to implement the loader() function, we need to address the following questions:

Where is the executable file?
Where in the executable file are the code and data located?
How much code and data are there?
Where are the "correct memory locations"?

To answer the first question, we need to first explain where user programs come from. User programs run on top of the operating system, and due to differences in the runtime environment, we cannot run programs compiled for AM directly on the operating system. For this reason, we have prepared a new subproject, Navy-apps, specifically for compiling user programs for the operating system. The framework code for Navy can be obtained by executing the following command:

cd ics2023
bash init.sh navy-apps

The structure of the Navy subproject is as follows, for more explanations, you can read README.md:

navy-apps
├── apps            # User program
│   ├── am-kernels
│   ├── busybox
│   ├── fceux
│   ├── lua
│   ├── menu
│   ├── nplayer
│   ├── nslider
│   ├── nterm
│   ├── nwm
│   ├── onscripter
│   ├── oslab0
│   └── pal         # Chinese palatin
├── fsimg           # Root FS
├── libs            # Runtime lib
│   ├── libc        # Newlib libC
│   ├── libam
│   ├── libbdf
│   ├── libbmp
│   ├── libfixedptc
│   ├── libminiSDL
│   ├── libndl
│   ├── libos       # User level abstraction of system call
│   ├── libSDL_image
│   ├── libSDL_mixer
│   ├── libSDL_ttf
│   └── libvorbis
├── Makefile
├── README.md
├── scripts
└── tests           # Some tests

Navy's Makefile organization is very similar to abstract-machine, so you should find it easy to understand. In navy-apps/libs/libc, there is a project named Newlibopen in new window, which is a C library specifically provided for embedded systems, with functions that have very low runtime environment requirements. This is very friendly for Nanos-lite, as we do not need to implement additional features in Nanos-lite to accommodate the C library. The entry point of the user program is in the _start() function located at navy-apps/libs/libos/src/crt0/start.S, where crt stands for C RunTime, and 0 indicates the very beginning. The _start() function will call the call_main() function in navy-apps/libs/libos/src/crt0/crt0.c, then it calls the user program's main() function, and after returning from main(), it calls exit() to end execution.

The code of the C library is "always" correct

Some students have thought there were bugs in the C library code while debugging, and after modifying the C library code, the program indeed ran successfully. In fact, the essential tasks of PA do not require modifying the C library code. Modifying the C library code to make the program run successfully indicates that the bug is still in your code. Modifying the C library is just a way to circumvent a bug you have already discovered; it has not been resolved but has returned to a latent state, You are likely to encounter it again in the future, and the effort to resolve it may be greater, and it will be difficult to determine if they are the same bug when you encounter them.

In short, when you decide to take a shortcut, you need to calmly analyze the pros and cons before making a decision.

The first user program we want to run on Nanos-lite is navy-apps/tests/dummy/dummy.c. To avoid conflicts with the content of Nanos-lite, we currently require that the user program be linked near the memory location 0x3000000 (x86) or 0x83000000 (mips32 or riscv32), Navy has already set the appropriate options (see the LDFLAGS variable in navy-apps/scripts/$ISA.mk). To compile dummy, execute in the navy-apps/tests/dummy/ directory:

make ISA=$ISA

The first time compiling in Navy, it will fetch and compile projects like Newlib from GitHub, and many warnings may appear during the compilation process, which can currently be ignored. After successful compilation, manually copy and rename navy-apps/tests/dummy/build/dummy-$ISA to nanos-lite/build/ramdisk.img, then execute in the nanos-lite/ directory:

make ARCH=$ISA-nemu

This will generate the executable file of Nanos-lite, during which the ramdisk image file nanos-lite/build/ramdisk.img is included as part of Nanos-lite (implemented in nanos-lite/src/resources.S). The current ramdisk is very simple; it contains only one file, which is the user program dummy we are about to load, this actually answers the first question above: the executable file is located at offset 0 in the ramdisk, accessing it will get the first byte of the user program.

How to identify different executable file formats?

If you try to execute a Windows executable file in GNU/Linux, it will report a "format error". Think about how GNU/Linux knows it's a "format error".

ELF files provide two perspectives for organizing an executable file: one is the section perspective for the linking process, which provides information for linking and relocation (such as symbol tables); the other is the segment perspective for execution, which provides information for loading the executable file. Using the readelf command, we can also see the mapping relationship between sections and segments: a segment may consist of zero or more sections, but a section may not be included in any segment.

We are now concerned with how to load programs, so we focus on the segment perspective. ELF uses a program header table to manage segments, and an entry in the program header table describes all attributes of a segment, including type, virtual address, flags, alignment, as well as file offset and segment size. Based on this information, we can know which bytes of the executable file need to be loaded, and we can also see that loading an executable file does not mean loading all its contents, only those contents related to runtime need to be loaded, such as debug information and symbol tables do not need to be loaded. We can determine whether a segment needs to be loaded by checking if the Type attribute of the segment is PT_LOAD.

Redundant attributes?

When using readelf to view the information of an ELF file, you will see that a segment contains two size attributes, which are FileSiz and MemSiz. Why is that? Upon closer observation, you will find that FileSiz usually does not exceed the corresponding MemSiz. Why is that?

We explain how to load a segment based on its attributes with the following diagram:

      +-------+---------------+-----------------------+
      |       |...............|                       |
      |       |...............|                       |  ELF file
      |       |...............|                       |
      +-------+---------------+-----------------------+
      0       ^               |              
              |<------+------>|       
              |       |       |             
              |       |                            
              |       +----------------------------+       
              |                                    |       
   Type       |   Offset    VirtAddr    PhysAddr   |FileSiz  MemSiz   Flg  Align
   LOAD       +-- 0x001000  0x03000000  0x03000000 +0x1d600  0x27240  RWE  0x1000
                               |                       |       |     
                               |   +-------------------+       |     
                               |   |                           |     
                               |   |     |           |         |       
                               |   |     |           |         |      
                               |   |     +-----------+ ---     |     
                               |   |     |00000000000|  ^      |   
                               |   | --- |00000000000|  |      |    
                               |   |  ^  |...........|  |      |  
                               |   |  |  |...........|  +------+
                               |   +--+  |...........|  |      
                               |      |  |...........|  |     
                               |      v  |...........|  v    
                               +-------> +-----------+ ---  
                                         |           |     
                                         |           |    
                                            Memory

You need to find out the Offset, VirtAddr, FileSiz, and MemSiz parameters for each segment that needs to be loaded. The relative file offset Offset indicates that the content of the respective segment starts from the Offset byte of the ELF file, with a size in the file of FileSiz, it needs to be allocated to the virtual memory location starting at VirtAddr, occupying a size in memory of MemSiz. That is, the memory used by this segment is the continuous interval [VirtAddr, VirtAddr + MemSiz), then the content of the segment is read from the ELF file into this memory interval, and the physical interval corresponding to [VirtAddr + FileSiz, VirtAddr + MemSiz) is cleared to zero.

Why clear to zero?

Why is it necessary to clear the physical interval corresponding to [VirtAddr + FileSiz, VirtAddr + MemSiz) to zero?

Regarding where programs come from, you can refer to an article: COMPILER, ASSEMBLER, LINKER AND LOADER: A BRIEF STORYopen in new window. If you wish to consult more information related to ELF files, please refer to

man 5 elf

Since the ELF file is in ramdisk, the framework code provides some ramdisk-related functions (defined in nanos-lite/src/ramdisk.c), you can use them to implement the functionality of the loader:

// Read `len` bytes from `offset` in ramdisk into `buf`
size_t ramdisk_read(void *buf, size_t offset, size_t len);

// Write `len` bytes from `buf` to `offset` in ramdisk
size_t ramdisk_write(const void *buf, size_t offset, size_t len);

// Return the size of the ramdisk, in bytes
size_t get_ramdisk_size();

In fact, the work of the loader presents us with the most primitive state of the program: a bitstream! Loading a program is essentially placing this inconspicuous bitstream in the correct position, but this also reflects the epoch-making idea of "stored program": when the operating system hands over control to it, the computer interprets it as instructions and executes them one by one. The loader allows the computer's lifecycle to break through the boundaries of the program: the end of a program does not mean the computer stops working, the computer will fulfill the mission of executing programs throughout its life.

Implement loader

You need to implement the loader functionality in Nanos-lite, which loads the user program into the correct memory location and then executes it. The loader() function is defined in nanos-lite/src/loader.c. The pcb parameter is currently not used and can be ignored, and since there is currently only one file in the ramdisk, the filename parameter can also be ignored. The filename will become useful after the file system is implemented in the next phase.

After implementation, call naive_uload(NULL, NULL) in init_proc(), which will invoke your implemented loader to load the first user program and then jump to execute it. If your implementation is correct, you will see an unhandled event number 4 triggered in Nanos-lite when executing the dummy program. This indicates that the loader has successfully loaded dummy and successfully jumped to execute it. We will explain the unhandled events later.

Check the magic number of the ELF file

We know that the beginning of ELF files contains a special magic number to prevent the loader from loading a non-ELF format file, we can check the magic number in the loader:

assert(*(uint32_t *)elf->e_ident == 0xBadC0de);

You need to replace the above 0xBadC0de with the correct magic number.

Don't underestimate this seemingly silly assert(), when it catches something you did by mistake one day, you will be thankful.

Detect the ISA type of the ELF file

You might accidentally make the native Nanos-lite load and run a dummy for x86/mips32/riscv32. From the perspective of ISA specifications, this behavior is clearly UB (undefined behavior), and usually results in some hard-to-understand errors. To avoid this, you can detect the ISA type of the ELF file in the loader. We can filter out the expected ISA type based on some macros defined in AM:

#if defined(__ISA_AM_NATIVE__)
# define EXPECT_TYPE EM_X86_64
#elif defined(__ISA_X86__)
# define EXPECT_TYPE ...  // see /usr/include/elf.h to get the right type
...
#else
# error Unsupported ISA
#endif

Then compare it with a field in the ELF information. If the ISA type of the ELF file to be loaded does not match the expected one, report an error. If you don't know where the macros in AM are defined, RTFSC (Read The Fine Source Code). If you don't know which field in the ELF to compare with, RTFM (Read The Fine Manual).

Compile Nanos-lite to native

You can test whether your Nanos-lite implementation is correct on native.

Since native is a 64-bit environment, some of the ELF data structures will differ from those in 32-bit, but overall, the process of loading ELF is the same. To mask the differences in data structures, macros Elf_Ehdr and Elf_Phdr are defined in nanos-lite/src/loader.c, and you can use them in your loader implementation.

Additionally, to compile a dummy that can run on AM native's Nanos-lite, you need to compile in Navy using make ISA=am_native. The subsequent operations are similar to those run on $ISA-nemu, namely:

Compile dummy using ISA=xxx
Use the compiled dummy ELF file as the ramdisk for nanos-lite
Compile and run nanos-lite using ARCH=xxx

Navy also has an ISA called native, which is different from the ARCH mechanism named native in AM, and is not currently used.

Operating System Runtime Environment

After loading the program, let's talk about program execution. Reviewing PA2, we know that program execution requires the support of a runtime environment. Since the operating system wants to load and run programs, it naturally has the responsibility to provide the functionality of the runtime environment.

In PA2, we divided the runtime environment into two parts based on whether the specific implementation is related to the ISA. But for programs running on the operating system, they no longer need to interact directly with the hardware.

So from the operating system's perspective, how should it view these runtime environments?

Note that some functions of the runtime environment require the use of resources, such as requesting memory that requires physical memory, and updating the screen that requires a frame buffer. In PA2, our computer system is monopolized by a single program, and it can do whatever it wants, and if it's broken, it's just the problem of that one program.

But in modern computer systems, there may be multiple programs running concurrently or even using the system's resources simultaneously. If each program directly uses these resources without knowing about the usage of others, the entire system will quickly become a mess: for example, I overwrite your screen, and you overwrite my memory space...

So there needs to be a role to manage the system's resources in a unified way: programs can no longer use resources arbitrarily, and they need to apply to the resource manager when using them. Since the operating system is at a high privilege level and enjoys supreme authority, it naturally also needs to fulfill the corresponding obligations: as the resource manager, the operating system needs to manage all the resources in the system, and also needs to provide corresponding services for user programs.

These services need to be presented through a unified interface, and user programs can only request services through this interface.

This interface is the system call. This is the mission that the operating system has been endowed with since its birth: as we mentioned earlier, one of the main tasks of GM-NAA I/O is to load new programs, and its other main function is to provide a public interface for program input and output. The public interface provided by GM-NAA I/O can be considered as the primordial form of system calls.

Necessity of System Calls

For a batch processing system, are system calls necessary? If the AM API is directly exposed to programs in the batch processing system, will there be any problems?

System calls divide the entire runtime environment into two parts: the operating system kernel area and the user area. Functionalities that access system resources are implemented in the kernel area, while the user area retains some functionalities that do not require the use of system resources (such as strcpy()), as well as the system call interface for requesting system resource-related services.

In this model, user programs can only "compute" obediently in the user area, and any tasks beyond pure computing capabilities must be requested through system calls to the operating system. If a user program attempts any illegal operation, the CPU will throw an exception signal to the operating system, causing the execution of the illegal operation to "fail" and be handled by the operating system. Yes, this is the hardware protection mechanism we introduced earlier, which the operating system needs to rely on to block the malicious behavior of programs.

Although the operating system needs to serve user programs, this does not mean that the operating system needs to expose all information to user programs. Some information is unnecessary and should never be known to user processes, such as data structures related to memory management. If a malicious program obtains this information, it may provide the basis for malicious attacks. Therefore, there is usually no system call to access these private data of the operating system.

System Call

So, what exactly is the process of triggering a system call?

Experiences from real life can give us some insights: when we go to a bank to conduct transactions, we need to tell the staff what service we want, what our account number is, and how much the transaction amount is, essentially letting the staff know what we specifically want to do. It's a similar situation when a user program executes a system call, it needs a way to describe its needs, and then communicate them to the operating system.

Speaking of "communicating with the operating system", you should immediately think of this being implemented through a trap instruction. In GNU/Linux, user programs trigger system calls through trap instructions, and Nanos-lite follows this convention. The yield() in CTE is also implemented through a trap instruction, although they trigger different events, from context saving to event dispatching, their processes are very similar. Since we trigger system calls through trap instructions, for user programs, the most convenient means to describe their needs to the operating system is to use general-purpose registers, because after the execution of the trap instruction, the execution flow will immediately switch to a pre-set entry point, and the general-purpose registers will be saved as part of the context. The system call handler function only needs to retrieve the necessary information from the context, to know what service request the user program has issued.

Navy has already prepared the system call interface for user programs. The _syscall_() function defined in navy-apps/libs/libos/src/syscall.c already encompasses the aforementioned process:

intptr_t _syscall_(intptr_t type, intptr_t a0, intptr_t a1, intptr_t a2) {
  // ...
  asm volatile (SYSCALL : "=r" (ret) : "r"(_gpr1), "r"(_gpr2), "r"(_gpr3), "r"(_gpr4));
  return ret;
}

The code first places the system call parameters into registers sequentially, then executes a trap instruction. Since both registers and trap instructions are ISA-related, different macros are defined here according to different ISAs to abstract them. CTE packages this trap operation into a system call event EVENT_SYSCALL and hands it over to Nanos-lite for further processing.

Identify System Calls

Currently, dummy has already triggered a system call directly through _syscall_(). You need to enable Nanos-lite to recognize the system call event EVENT_SYSCALL.

Processors usually provide only one trap instruction, at this time both EVENT_SYSCALL and EVENT_YIELD are implemented through the same trap instruction, thus CTE needs an additional way to distinguish them. If the trap instruction itself can carry parameters, different parameters can indicate different events, for example, x86 and mips32 can use this method; If the trap instruction itself cannot carry parameters, then it needs to be distinguished by other states, one way is through the value of a certain register, riscv32 uses this method.

You may need to modify multiple parts of the code, and when you are puzzled by the inability to implement the code correctly, please check every detail in this process. We have emphasized many times, understanding the details is very important.

After receiving the system call event, Nanos-lite will invoke the system call handler function do_syscall() for processing. do_syscall() first retrieves the previously set system call parameters from the context c using the macro GPR1, and dispatches based on the first parameter - the system call number. However, currently, Nanos-lite has not implemented any system calls, thus triggering a panic.

Adding a system call is simpler than you might think, as all the information is already prepared. We just need to add the corresponding system call number during the dispatch process, write the corresponding system call handler function sys_xxx(), and then call it. Looking back at the dummy program, it triggered a SYS_yield system call. We agree that this system call should directly call CTE's yield(), and then return 0.

The last thing in handling a system call is setting the return value of the system call. For different ISAs, the return value of the system call is stored in different registers, and the macro GPRx is used to abstract this, so we can set the return value of the system call through GPRx.

After passing through CTE, the execution flow returns from do_syscall() all the way back to the user program's _syscall_() function. The code finally retrieves the return value of the system call from the corresponding register and returns it to the caller of _syscall_(), informing them of the status of the system call execution (such as whether it was successful, etc.).

Implement SYS_yield System Call

You need to:

Implement the correct GPR? macros in the corresponding header files under abstract-machine/am/include/arch/, to correctly retrieve the system call parameter registers from the context c.
Add the SYS_yield system call.
Set the return value of the system call.

Rerun the dummy program, if your implementation is correct, you will see that the dummy program triggers another system call with number 0. Check nanos-lite/src/syscall.h, you will find it is a SYS_exit system call. This indicates that the previous SYS_yield has successfully returned, and SYS_exit is triggered because dummy has finished executing and is ready to exit.

Implement SYS_exit System Call

You need to implement the SYS_exit system call, which will receive an exit status parameter. For ease of testing, we are currently using this parameter to directly call halt(). After successful implementation, rerun the dummy program, and you will see the message HIT GOOD TRAP.

Passing System Call Numbers in RISC-V

If you choose RISC-V, you will find that it does not pass the system call number through a0. In fact, we referred to the convention of system call parameter passing in RISC-V Linux: that is, the system call number is also passed through this register in RISC-V Linux. Why do you think RISC-V Linux does not use a0 to pass the system call number?

Linux System Calls

You can consult the following commands for information on Linux system calls:

man syscall - Consult the system call conventions for different architectures, including parameter passing and return values
man syscalls - Consult the system calls that have been implemented in Linux. Oh, there are so many, but we only need a few system calls in PA.

System Call Trace - strace

We already know that user programs running on an operating system can only do two things: one is to perform local computations, and the other is to request the operating system to complete tasks that local computations cannot achieve through system calls. This means that if we can observe the trace of system calls (syscall trace), we can gain a deeper understanding of the program's behavior. There is a tool called strace on Linux (which can be installed via apt-get), that can record the trace of system calls made by user programs. We strongly recommend you install and try using it. For example, you can use strace ls to understand the behavior of ls, and you can even see how ls is loaded; if you are interested in strace itself, you can study how strace is implemented by using strace strace ls.

In fact, we can also implement a simple strace in Nanos-lite: Nanos-lite can obtain all information about system calls, including names, parameters, and return values. This is why we choose to implement strace in Nanos-lite: system calls carry high-level program semantics, but in NEMU, we can only see the low-level state machine.

Implement strace

Implementing strace in Nanos-lite is a simple task.

TRM Above the Operating System

We have already implemented two simple system calls, so what else can user programs do on the current Nanos-lite? You might recall how we classified the needs of programs in PA2, which is AM! At the most basic level, TRM shows us what conditions are needed to satisfy the basic computational capabilities of a program:

The machine provides basic arithmetic instructions
Can output characters
Has a heap area for dynamic memory allocation
Can terminate execution

Basic arithmetic instructions still rely on the machine to provide, which is the instruction system you have already implemented in PA2. As for terminating execution, the SYS_exit system call has also been provided. To provide user programs with the ability to output characters and dynamically allocate memory, we need to implement more system calls.

Standard Output

In GNU/Linux, output is implemented through the SYS_write system call. According to the function declaration of write (refer to man 2 write), after identifying the system call number as SYS_write in do_syscall(), you need to check the value of fd. If fd is 1 or 2 (representing stdout and stderr, respectively), then output the len bytes starting from buf to the serial port (using putch()). Finally, you must set the correct return value, otherwise, the caller of the system call will think that write did not execute successfully and will retry. As for the return value of the write system call, please consult man 2 write. Also, do not forget to call the system call interface function in _write() in navy-apps/libs/libos/src/syscall.c.

After implementing the SYS_write system call, we have removed the biggest obstacle to "using printf()", because after printf() formats the string, it will ultimately output through the write() system call. This work has already been prepared for us by the Newlib library in Navy.

Run Hello World on Nanos-lite

Navy provides a hello test program (navy-apps/tests/hello), which first outputs a sentence through write(), and then continuously outputs through printf().

You need to implement the write() system call, and then switch the user program running on Nanos-lite to the hello program to run.

Heap Management

You should have already used the malloc()/free() library functions in programming courses, which are used to allocate/release a block of memory in the user program's heap area. The usage of the heap area is managed by libc, but the size of the heap area needs to be changed through a system call to the operating system. This is because the heap area is essentially a memory area, and when it is necessary to adjust the size of the heap area, it is actually adjusting the memory area available to the user program. In fact, the memory area available to a user program needs to be allocated and managed by the operating system. Imagine if a malicious program could use other programs' memory areas without the consent of the operating system, it would cause disastrous consequences. Of course, currently, Nanos-lite is just a single-task operating system, and the concept of multiple programs does not exist. In PA4, you will have a deeper understanding of this issue.

Adjusting the size of the heap is implemented through the sbrk() library function, whose prototype is

void* sbrk(intptr_t increment);

It is used to increase the program break of the user program by increment bytes, where increment can be negative. The so-called program break is the end position of the data segment of the user program. We know that an executable file contains code segments and data segments. During linking, ld will by default add a symbol named _end to indicate the end position of the program's data segment. When the user program starts running, the program break will be at the position indicated by _end, meaning that the size of the heap is 0 at this time. When malloc() is called for the first time, it will query the current program break position of the user program through sbrk(0), and then it can dynamically adjust the position of the program break of the user program through subsequent sbrk() calls. The interval between the current program break and its initial value can serve as the heap area of the user program, managed by malloc()/free(). Note that user programs should not directly use sbrk(), as it will disrupt the management records of malloc()/free() for the heap.

In Navy's Newlib, sbrk() ultimately calls _sbrk(), which is defined in navy-apps/libs/libos/src/syscall.c. The framework code makes _sbrk() always return -1, indicating that the heap adjustment failed. In fact, when the user program calls printf() for the first time, it will try to apply for a buffer through malloc() to store the formatted content. If the application fails, it will output character by character. If you open strace in Nanos-lite, you will find that when the user program outputs through printf(), it indeed calls write() to output character by character.

But if the heap is always unavailable, many library functions in Newlib will not be usable, so now you need to implement _sbrk(). To implement the functionality of _sbrk(), we also need to provide a system call for setting the size of the heap. In GNU/Linux, this system call is SYS_brk, which takes one parameter addr to indicate the new position of the program break. _sbrk() manages the position of the program break of the user program through recording, and its working method is as follows:

The initial position of the program break is at _end
When called, based on the recorded position of the program break and the parameter increment, calculate the new program break
Use the SYS_brk system call to have the operating system set the new program break
If the SYS_brk system call is successful, it will return 0, at which time the previously recorded position of the program break is updated, and the old program break position is returned as the return value of _sbrk()
If the system call fails, _sbrk() will return -1

The above code is implemented in the user-level library functions, and we also need to implement the SYS_brk functionality in Nanos-lite. Since Nanos-lite is currently a single-task operating system, the free memory can be freely used by user programs, so we only need to make the SYS_brk system call always return 0, indicating that the adjustment of the heap size is always successful. In PA4, we will modify this system call to implement real memory allocation.

Implement heap management

Based on the above content, implement the SYS_brk system call in Nanos-lite, and then implement _sbrk() at the user level. You can consult the behavior of brk() and sbrk() in libc through man 2 sbrk, and also how to use the _end symbol through man 3 end.

It should be noted that during debugging, do not output through printf() in _sbrk(), as printf() will still try to apply for a buffer through malloc(), which will eventually call _sbrk() again, causing a dead recursion. You can output the debugging information to a string buffer first through sprintf(), and then output it through _write().

If your implementation is correct, you can see with the help of strace that printf() is no longer outputting character by character through write(), but is outputting the formatted string in one go.

Buffering and System Call Overhead

You have already learned about the process of system calls. In fact, if you go through the trouble of making a system call just to output a single character, it's not very cost-effective. Hence, the technique of batching: accumulating simple tasks and then processing them all at once. Buffering is the core of batching technology, and libc's fread() and fwrite() accumulate data in a buffer, then process it with a single system call. For example, using a 1024-byte buffer, you can output 1024 characters with one system call, instead of making 1024 system calls to output each character individually. Clearly, the overhead of the latter is much greater.

Interested students can write a corresponding program on GNU/Linux to roughly test the overhead of a single write() system call, and then compare it with this articleopen in new window.

printf and Newline

We mentioned in PA1 that when debugging with printf(), you need to add \n, and now we can finally explain why: fwrite() has a buffering mechanism, and characters printed by printf() may not be output immediately through the write() system call, but encountering \n can force the contents of the buffer to be output. Interested students can read navy-apps/libs/libc/src/stdio/wbuf.c, which implements the buffering functionality.

After implementing these two system calls, in principle, all programs that can run on TRM can now run on Nanos-lite. However, we have not strictly exposed the corresponding system call functionalities to user programs according to AM's API, since compared to AM, the libc interface is more widely used for programs running on operating systems, so we need not reinvent the wheel.

Mandatory Question (to be answered in the lab report) - What is the hello program, where does it come from, and where is it going?

By now, all components in PA have been introduced, and the entire computer system is beginning to be complete. You have also started running the hello program, the first passable user program on this computer system you created (dummy was just for warming up, doesn't count), and the good news is, we are not far from running the game "Legend of Sword and Fairy" (that's the next stage).

However, according to the tradition of PA, just running it is not enough, you also need to understand exactly how it runs. So, answer this mandatory question:

We know navy-apps/tests/hello/hello.c is just a C source file, which will be compiled and linked into an ELF file. So, where does the hello program start? How does it appear in memory? Why does it appear at its current memory location? Where is its first instruction? How exactly does it execute its first instruction? The hello program continuously prints strings, what does each character go through before it finally appears on the terminal?

We've asked a lot of questions in one breath, and we want to say that there are a lot of details you need to understand. We hope you can carefully organize each line of code involved, and then describe your understanding of this process in your own words, rather than mechanically answering these questions point by point.

Similarly, the mandatory question from the previous stage "Understanding the Journey Through Time" has already covered some content, you can include its answer, but you need to clearly describe the differences. Additionally, the process from printf() to write() in the C library is quite cumbersome and is not the main content of PA, so you do not need to expand on this part. And you have already implemented your own printf() in PA2, so you should understand the process of string formatting. If you are interested in the implementation of Newlib, you can also RTFSC.

In short, excluding the part of the C library from printf() to write() conversion, the rest of the code is what you should thoroughly understand. So, strive to understand every line of code!

Support for Multiple ELF Files in ftrace

If we want to understand the process from printf() to write() in the C library, ftrace will be a great tool. However, we know that Nanos-lite and the user programs it loads are two separate ELF files, which means that if we specify one of the ELF files to NEMU's ftrace, ftrace will not be able to correctly translate the addresses of the other into the correct function names. In fact, we can make NEMU's ftrace support multiple ELF files: if an address does not belong to any function in one ELF, it will try the next ELF. In this way, ftrace can trace function calls in both Nanos-lite and user programs simultaneously.

Friendly Reminder

Phase 2 of PA3 ends here.

# User Programs and System Calls

# The Simplest Operating System