Simple file system

To implement a complete batch processing system, we need to provide multiple programs to the system. Previously, we stored programs as files on ramdisk, but if the number of programs increases, we need to know the location of each program on the ramdisk. Our ramdisk has already provided read and write interfaces, making it convenient to access content at any location, which seems not difficult for Nanos-lite; On the other hand, user programs also need to process data, which may also be organized into files, so for user programs, how do they know the location of files on the ramdisk? Moreover, files are dynamically added and deleted, and user programs are not aware of this. This indicates that it is not feasible to directly provide the ramdisk's read and write interface for use by user programs. The operating system also needs to provide a higher level of abstraction for user programs on top of the storage medium's driver, which is the file.

The essence of a file is a sequence of bytes, along with some additional attributes. Here, we first discuss files in the ordinary sense. Thus, those additional attributes maintain the mapping of files to storage locations on the ramdisk. To manage these mappings and provide an interface for file operations to the upper layer, we need to implement a file system in Nanos-lite.

Do not be intimidated by the term "file system"; our needs for a file system are not that complex, thus we can define a simple file system, sfs (Simple File System):

  • The size of each file is fixed
  • Writing to a file is not allowed to exceed the original size of the file
  • The number of files is fixed, and new files cannot be created
  • There are no directories

Since the number and size of the files are fixed, we can naturally fix each file at a specific location on the ramdisk. These simplified features greatly reduce the complexity of implementing a file system. Of course, real file systems are much more complex than sfs.

We agree that files are stored one after another from the very beginning of the ramdisk:

0
+-------------+---------+----------+-----------+--
|    file0    |  file1  |  ......  |   filen   |
+-------------+---------+----------+-----------+--
 \           / \       /            \         /
  +  size0  +   +size1+              + sizen +

To record the names and sizes of various files in the ramdisk, we also need a "file record table".

Nanos-lite's Makefile has already provided a script to maintain this information, first modify nanos-lite/Makefile as follows:

--- nanos-lite/Makefile
+++ nanos-lite/Makefile
@@ -1,2 +1,2 @@
-#HAS_NAVY = 1
+HAS_NAVY = 1
 RAMDISK_FILE = build/ramdisk.img

Then running make ARCH=$ISA-nemu update will automatically compile the programs in Navy, and integrate all the contents under the navy-apps/fsimg/ directory into the ramdisk image navy-apps/build/ramdisk.img, while generating the file record table navy-apps/build/ramdisk.h for this ramdisk image, Nanos-lite's Makefile will link them to the project through a soft link.

Remember to update the image file

If you have modified the content in Navy, please remember to update the image file through the above command.

The "file record table" is actually an array, each element of which is a structure:

typedef struct {
  char *name;         // File name
  size_t size;        // File size
  size_t disk_offset;  // File offset in ramdisk
} Finfo;

In sfs, these three pieces of information are fixed and unchangeable. The file names are a bit different from our usual habits: Since sfs has no directories, we consider the directory separator / as part of the file name, for example, /bin/hello is a complete file name. This approach also implies a hierarchical structure of directories, for a small number of files, this approach is both simple and effective.

With this information, the most basic file read and write operations can already be implemented:

size_t read(const char *filename, void *buf, size_t len);
size_t write(const char *filename, const void *buf, size_t len);

But in real operating systems, this practice of directly using file names as parameters for read and write operations has its flaws. For example, when we use the less tool to browse files:

cat file | less

The cat tool wants to write the file content to the standard input of the less tool, but we cannot use a file name to identify the standard input of the less tool! In fact, there are indeed many "nameless" files in the operating system. To manage them uniformly, we wish to represent a file by a number, which is the file descriptor. A file descriptor corresponds to an open file, with the operating system maintaining the mapping from file descriptors to specific files. Thus, we naturally use the open() system call to open a file and return the corresponding file descriptor.

In Nanos-lite, since the number of files in sfs is fixed, we can simply use the index of the file record table as the file descriptor for the corresponding file and return it to the user program. After this, all file operations are identified through file descriptors:

size_t read(int fd, void *buf, size_t len);
size_t write(int fd, const void *buf, size_t len);
int close(int fd);

Additionally, we do not want every read and write operation to start from the beginning. Therefore, we need to introduce an open_offset attribute for each open file to record the current position of file operations. Each time a certain number of bytes is read or written to a file, the offset advances by that amount.

File Offset and User Programs

In real operating systems, maintaining the offset within the file record table can prevent user programs from implementing certain functionalities. However, explaining this issue requires understanding some knowledge beyond the scope of this course, so we will not elaborate here. You can think about this issue when you study operating systems courses.

Since Nanos-lite is a simplified operating system, the above problem will not occur temporarily, To simplify the implementation, we still maintain the offset in the file record table.

Offsets can be adjusted through the lseek() system call, allowing for reading and writing at any position within the file:

size_t lseek(int fd, size_t offset, int whence);

To facilitate standard input and output for user programs, the operating system has prepared three default file descriptors:

#define FD_STDIN 0
#define FD_STDOUT 1
#define FD_STDERR 2

They correspond to standard input stdin, standard output stdout, and standard error stderr. The frequently used printf ultimately calls write(FD_STDOUT, buf, len) for output; while scanf will read input by calling read(FD_STDIN, buf, len).

The file_table defined in nanos-lite/src/fs.c will include nanos-lite/src/files.h, which also contains placeholders for three special files: stdin, stdout, and stderr, These are just to ensure that the sfs and the agreed standard input and output file descriptors remain consistent, for example, according to the agreement, the file descriptor for stdout is 1, and after we add three placeholder entries, the index 1 in the file record table will not be allocated to other ordinary files.

Based on the above information, we can implement the following file operations in the file system:

int fs_open(const char *pathname, int flags, int mode);
size_t fs_read(int fd, void *buf, size_t len);
size_t fs_write(int fd, const void *buf, size_t len);
size_t fs_lseek(int fd, size_t offset, int whence);
int fs_close(int fd);

These file operations are actually implementations of the corresponding system calls in the kernel. You can consult their functions through man, for example

man 2 open

where 2 indicates consulting the manual page related to system calls. When implementing these file operations, note the following points:

  • Since every file in sfs is fixed and does not generate new files, thus "fs_open() not finding the file indicated by pathname" is an exceptional situation, you need to use assertion to terminate the program.
  • To simplify implementation, we allow all user programs to read and write all existing files, thus later, we can ignore flags and mode when implementing fs_open().
  • Use ramdisk_read() and ramdisk_write() for actual file read and write operations.
  • Since the size of the files is fixed, when implementing fs_read(), fs_write(), and fs_lseek(), be careful not to exceed the file boundaries.
  • Apart from writing to stdout and stderr (output to the serial port using putch()), operations on the three special files stdin, stdout, and stderr can be directly ignored.
  • Since sfs does not maintain the state of file opening, fs_close() can directly return 0, indicating always successful closure.

Finally, you also need to add the corresponding system calls in Nanos-lite and Navy's libos to invoke the respective file operations.

Let the loader use files

Previously, we had the loader directly call ramdisk_read() to load user programs. After the number of files in the ramdisk increases, this method becomes inappropriate, we first need to let the loader enjoy the convenience of the file system.

You need to first implement fs_open(), fs_read(), and fs_close(), so that you can use filenames in the loader to specify the program to load, such as "/bin/hello".

After implementation, changing user programs later will only require modifying the filename passed to the naive_uload() function.

Implement a complete file system

Implement fs_write() and fs_lseek(), then run the test program navy-apps/tests/file-test. To compile it, you need to add it to the TESTS variable in navy-apps/Makefile, so that it will ultimately be included in the ramdisk image. This test program is used for some simple file read, write, and positioning operations. If your implementation is correct, you will see the program output PASS!!!.

Remember to update the application list

If you want to add an application to the image, remember to add it to the application list in the above Makefile file.

Support sfs's strace

Due to the characteristics of sfs, opening the same file always returns the same file descriptor. This means that we can directly translate the file descriptors in strace into filenames, obtaining more readable trace information. Try to implement this feature, it can provide some convenience for your future use of strace.

Everything is file

IOE in AM shows us the need for programs to perform input and output. So, on Nanos-lite, how should user programs access devices? One direct method is for the operating system to provide a separate system call for each device, allowing user programs to directly use the corresponding functionalities. However, this approach has several problems:

  • First, the types of devices are varied and their functionalities are numerous, making it impractical to implement separate system calls for each to provide interfaces to user programs;
  • Moreover, due to the significant differences in device functionalities, if the provided interfaces are not unified, the interaction between programs and devices becomes difficult. Thus, we need a way to abstract the functionalities of devices to provide a unified interface to user programs.

We previously mentioned that the essence of a file is a sequence of bytes. In fact, computer systems are filled with sequences of bytes (how would a computer handle a mere unordered collection of bytes?), and we can easily list many examples:

  • Memory is byte-addressed, naturally forming a byte sequence, making our previously used ramdisk as a byte sequence even more apparent
  • A pipe (the | in shell commands) is a FIFO byte sequence, essentially a queue buffer in memory
  • A disk can also be seen as a byte sequence: We can number each byte on the disk, for example, the nth byte in the xth cylinder, yth head, zth sector, arranging all the bytes on the disk by number size, we get a byte sequence
  • A socket (network socket) is also a byte sequence, it has a buffer responsible for storing received network packets, upper-layer applications view the contents of the socket as a byte sequence and handle them through some special file operations. In PA2, we introduced DiffTest, if you RTFSC, you will find that qemu-diff communicates with QEMU through a socket, and the way to operate a socket is fgetc() and fputc()
  • Some information of the operating system can be exposed to users in the form of byte sequences, such as CPU configuration information
  • Some special functions provided by the operating system, such as random number generators, can also be seen as an infinitely long byte sequence
  • Even some non-storage type hardware can be seen as byte sequences: The sequence of key codes we type on the keyboard forms a byte sequence, the content of each pixel on the display can also be seen as a byte sequence...

Since a file is a byte sequence, naturally, all the varied byte sequences mentioned above can be considered files. Unix operates this way, hence the saying "Everything is a file." This approach's most intuitive benefit is providing a unified interface for different things: We can use the file interface to operate everything on the computer without having to distinguish them in detail, for example, the ramdisk rule in navy-apps/Makefile connects the input and output of various shell tools through pipes to generate a file record table

wc -c $(FSIMG_FILES) | grep -v 'total$$' | sed -e 's+ ./fsimg+ +' |
  awk -v sum=0 '{print "\x7b\x22" $$2 "\x22\x2c " $$1 "\x2c " sum "\x7d\x2c";sum += $$1}' >> $(RAMDISK_H)

View the contents of a disk in hexadecimal

head -c 512 /dev/sda | hd

Check if the CPU has the Spectre vulnerability

cat /proc/cpuinfo | grep 'spectre'

Even the "Little Star" sample audio we provided in PA2 was created through simple file operations by brute force concatenation

cat Do.ogg Do.ogg So.ogg So.ogg La.ogg La.ogg So.ogg > little-star.ogg

and

#include "/dev/urandom"

will include the contents of the urandom device in the source file: since the urandom device is an infinitely long byte sequence, submitting a program source file containing the above content will cause some Online Judge platforms with weak detection features to crash directly.

The abstraction of "everything is a file" allows us to easily accomplish some tasks on Windows that are difficult to complete, reflecting part of the Unix philosophy: each program uses text files for input and output, making it easy for programs to cooperate. GNU/Linux, inheriting from Unix, naturally carries on this excellent feature. To provide a unified abstraction to user programs, Nanos-lite also attempts to abstract IOE as files.

Virtual File System

To realize the concept of "everything is a file," we need to extend our previously implemented file operations: We need to support not only ordinary file read and write operations but also operations on various "special files." The way to extend this is something you are already familiar with, which is abstraction!

We extend the semantics of the previously implemented file operation APIs to support operations on any file (including "special files"):

int fs_open(const char *pathname, int flags, int mode);
size_t fs_read(int fd, void *buf, size_t len);
size_t fs_write(int fd, const void *buf, size_t len);
size_t fs_lseek(int fd, size_t offset, int whence);
int fs_close(int fd);

This set of extended APIs has a cool name, called VFS (Virtual File System)open in new window. Since there is a virtual file system, there should correspondingly be "real file systems," which actually refer to how to operate a certain type of file. For example, in Nanos-lite, ordinary files are operated through the ramdisk API; in real operating systems, there are countless types of real file systems: for instance, those familiar with Windows should know about NTFS for managing ordinary files, while EXT4 is currently popular on GNU/Linux; as for special files, there are even more types, hence there are procfs, tmpfs, devfs, sysfs, initramfs... These different real file systems each implement the specific operation methods for these files.

Therefore, VFS is actually an abstraction of different types of real file systems, using a set of APIs to describe the abstract behavior of these real file systems, masking the differences between real file systems, upper layers (such as system call handling functions) do not need to worry about the specific type of file being operated on, simply calling this set of APIs can complete the corresponding file operations. With the concept of VFS, adding a real file system becomes very easy: just wrap the access method of the real file system into the VFS API, upper layers do not need to modify any code to support a new real file system.

Here it comes again

When reading the above text, if you thought of the concept of AM, you're right, because the idea behind VFS is also abstraction. In Nanos-lite, the key to implementing VFS is the two read and write function pointers in the Finfo structure:

typedef struct {
  char *name;         // File name
  size_t size;        // File size
  size_t disk_offset;  // File offset in ramdisk
  ReadFn read;        // Read function pointer
  WriteFn write;      // Write function pointer
} Finfo;

ReadFn and WriteFn are two types of function pointers, used to point to the actual read and write functions, and return the number of bytes successfully read or written. With these two function pointers, we only need to set different read and write functions for different files in the file record table, then we can call the specific read and write functions through f->read() and f->write().

Simulating Object-Oriented Programming in C

The implementation of VFS demonstrates how to simulate some basic concepts of object-oriented programming in C: for example, using structures to define classes, where ordinary variables in structures can be seen as class members, function pointers can be seen as class methods, setting different functions to function pointers can achieve method overloading...

This shows that those seemingly ethereal concepts in OOP are not much more advanced than C, it's just that the OOP compiler does more for us, and after compiling into machine code, OOP ceases to exist. Object-Oriented Programming With ANSI-Copen in new window This book specifically introduces how to simulate various concepts and functionalities of OOP in ANSI-C. In the kernel code of GNU/Linux, there are also traces of OOP.

However, in Nanos-lite, since there are very few special files, we have agreed that when the above function pointer is NULL, it indicates that the corresponding file is a regular file, which should be read and written through the ramdisk's API, thus, we do not need to explicitly specify the ramdisk's read and write functions for most regular files.

We view files as sequences of bytes, most of which are "static", for example, files on ramdisk and disks, if we do not modify them, they will always be in the same place, such byte sequences have the concept of "location"; however, some special byte sequences are not like this, for example, the byte sequence of typing keys is "flowing", once read, it no longer exists, the bytes in such sequences only have a sequential relationship, but cannot be numbered, therefore, they do not have the concept of "location". Files of the former type support the lseek operation, and the devices that store these files are called "block devices"; while files of the latter type do not support the lseek operation, and the corresponding devices are called "character devices". Real operating systems also abstract the lseek operation, but we have simplified it in Nanos-lite and do not implement this abstraction.

The IOE on top of the operating system

With VFS, it becomes very simple to abstract IOE as files.

First, let's look at the simplest output device: the serial port. In Nanos-lite, both stdout and stderr output to the serial port. Previously, you might decide whether sys_write() writes to the serial port based on whether fd is 1 or 2. Now with VFS, we no longer need to let the system call handler worry about these special files: we just need to implement serial_write() in nanos-lite/src/device.c, and then set the corresponding write function in the file record table to achieve the above function. Since the serial port is a character device, the corresponding byte sequence does not have the concept of "location", thus, the offset parameter in serial_write() can be ignored. Additionally, Nanos-lite does not intend to support reading from stdin, so an appropriate error function can be set in the file record table.

Abstract the serial port as a file

Based on the above content, let VFS support writing to the serial port.

Regarding input devices, let's first look at the clock. The clock is quite special, most operating systems do not abstract it as a file, but directly provide some clock-related system calls for user programs to access. In Nanos-lite, we also provide a SYS_gettimeofday system call, through which user programs can read the current system time.

Implement gettimeofday

Implement the gettimeofday system call, please RTFM for the meaning of the parameters of this system call. After implementation, add a timer-test test in navy-apps/tests/, in the test, obtain the current time through gettimeofday(), and output a sentence every 0.5 seconds.

To better encapsulate the functionality of IOE, we provide a multimedia library called NDL (NJU DirectMedia Layer) in Navy. The code for this library is located in navy-apps/libs/libndl/NDL.c, but most of its functions are not implemented. There are some contents related to NWM_APP in the code, which you can ignore for now, but do not modify the related code, you will experience its functionality at the end of PA4. NDL provides users with a clock-related API:

// Returns the system time in milliseconds
uint32_t NDL_GetTicks();

Implement NDL's clock

You need to use gettimeofday() to implement NDL_GetTicks(), and then modify the timer-test test, letting it obtain the current time by calling NDL_GetTicks(). You can add initialization and termination code in NDL_Init() and NDL_Quit() as needed, we agree that the program must call NDL_Init() before using the functions of the NDL library. If you think there is no need to add initialization code, then there is no need to change them.

Another input device is the keyboard, the key information is essentially an event for the system. A simple way is to represent the event in text form, we define the following two events,

  • Key down event, such as kd RETURN means pressing the Enter key
  • Key up event, such as ku A means releasing the A key

The names of the keys are the same as those defined in AM, all in uppercase. In addition, an event ends with a newline character \n.

There are two advantages to using text to describe events, first, text is obviously a sequence of bytes, which makes it easy to abstract events into files; furthermore, the textual method makes it easy for user programs to parse the content of events. Nanos-lite and Navy agree, the above events are abstracted into a special file /dev/events, it needs to support read operations, user programs can read key events from it, but it does not need to support lseek, because it is a character device.

NDL provides users with an API related to key events:

// Reads an event information and writes it into `buf`, up to `len` bytes
// If a valid event is read, the function returns 1, otherwise it returns 0
int NDL_PollEvent(char *buf, int len);

Abstract key input as a file

You need to:

  • Implement events_read() (defined in nanos-lite/src/device.c), write the event into buf, up to len bytes, and then return the actual length written. The key names are already defined in the string array names, you need to use the IOE's API to get the device input. Additionally, if there are no valid keys currently, return 0.
  • Add support for /dev/events in VFS.
  • Implement NDL_PollEvent() in NDL, reading events from /dev/events and writing them into buf.

We can assume that at most one event will be read at a time, which can simplify your implementation. After implementation, run navy-apps/tests/event-test on Nanos-lite, if implemented correctly, striking a key will output the key event information.

Use fopen() or open()?

This is a very worthwhile question to consider. You need to think about the differences in the specific behaviors of these two sets of APIs, and then analyze which function should be used to operate on the special file /dev/events. We have already provided some hints in a blue information box.

Finally, for VGA, to update the screen, the program only needs to write pixel information into VGA's video memory. Thus, what Nanos-lite needs to do is to abstract the video memory into a file. The video memory itself is also a storage space, storing the pixels to be displayed on the screen in a row-major order. Nanos-lite and Navy agree to abstract the video memory into the file /dev/fb (fb stands for frame buffer), it needs to support write operations and lseek, in order to update the pixels to a specified position on the screen.

NDL provides users with two APIs related to screen drawing:

// Opens a canvas of size (*w) X (*h)
// If both *w and *h are 0, then the entire system screen is used as the canvas, and *w and *h are set to the size of the system screen
void NDL_OpenCanvas(int *w, int *h);

// Draws a rectangle image of size `w*h` at the canvas coordinates `(x, y)`, and synchronizes the drawn area to the screen
// The image pixels are stored in `pixels` in a row-major order, each pixel described by a 32-bit integer in `00RRGGBB` format
void NDL_DrawRect(uint32_t *pixels, int x, int y, int w, int h);

Here, "canvas" is a concept oriented towards the program, and the coordinates for drawing in the program are set relative to the canvas, thus the program does not need to worry about the size of the system screen or where on the system screen the image needs to be drawn. NDL can decide where to "paste" the canvas based on the size of the system screen and the size of the canvas, for example, pasting it in the top left corner or centering it, thus writing the content of the canvas to the correct position in the frame buffer.

The functionality of NDL_DrawRect() is very similar to the drawing interface introduced in PA2. However, to implement it, NDL also needs to know the size of the screen. Nanos-lite and Navy agree that the information about the screen size is obtained through the /proc/dispinfo file, which needs to support read operations. The format of this file's content is specified in navy-apps/README.md, which you need to read. As for the specific screen size, you need to obtain it through the corresponding API of IOE.

Retrieve screen size in NDL

  • Implement dispinfo_read() (defined in nanos-lite/src/device.c), write the file's len bytes into buf (we assume this file does not support lseek, so offset can be ignored).
  • Read the content of this file in NDL, parse the screen size from it, and then implement the functionality of NDL_OpenCanvas(). Currently, NDL_OpenCanvas() only needs to record the size of the canvas, of course, we require that the canvas size does not exceed the screen size.

Run Nanos-lite with navy-apps/tests/bmp-test, since the drawing functionality is not yet implemented, you cannot output image content, but you can first use printf() to output the parsed screen size.

Abstract VGA video memory as a file

  • Initialize the size of /dev/fb in the file record table in init_fs() (defined in nanos-lite/src/fs.c).
  • Implement fb_write() (defined in nanos-lite/src/device.c), to write len bytes from buf to the screen at offset. You need to calculate the screen coordinates from offset first, then call IOE to perform the drawing. Additionally, we agree that after each drawing, the content in the frame buffer is always immediately synchronized to the screen.
  • Implement NDL_DrawRect() in NDL, by writing pixel information to the correct position in /dev/fb to draw the image. You need to clearly understand the positional relationship between the system screen (i.e., frame buffer), the canvas opened by NDL_OpenCanvas(), and the drawing area indicated by NDL_DrawRect().

Run Nanos-lite with navy-apps/tests/bmp-test, if implemented correctly, you will see the Project-N logo displayed on the screen.

Implement a centered canvas

Based on the size of the screen and the canvas, you can have NDL draw the image in the center of the screen, thus achieving a better visual effect.