Simple file system
To implement a complete batch processing system, we need to provide multiple programs to the system. Previously, we stored programs as files on ramdisk, but if the number of programs increases, we need to know the location of each program on the ramdisk. Our ramdisk has already provided read and write interfaces, making it convenient to access content at any location, which seems not difficult for Nanos-lite; On the other hand, user programs also need to process data, which may also be organized into files, so for user programs, how do they know the location of files on the ramdisk? Moreover, files are dynamically added and deleted, and user programs are not aware of this. This indicates that it is not feasible to directly provide the ramdisk's read and write interface for use by user programs. The operating system also needs to provide a higher level of abstraction for user programs on top of the storage medium's driver, which is the file.
The essence of a file is a sequence of bytes, along with some additional attributes. Here, we first discuss files in the ordinary sense. Thus, those additional attributes maintain the mapping of files to storage locations on the ramdisk. To manage these mappings and provide an interface for file operations to the upper layer, we need to implement a file system in Nanos-lite.
Do not be intimidated by the term "file system"; our needs for a file system are not that complex, thus we can define a simple file system, sfs (Simple File System):
- The size of each file is fixed
- Writing to a file is not allowed to exceed the original size of the file
- The number of files is fixed, and new files cannot be created
- There are no directories
Since the number and size of the files are fixed, we can naturally fix each file at a specific location on the ramdisk. These simplified features greatly reduce the complexity of implementing a file system. Of course, real file systems are much more complex than sfs.
We agree that files are stored one after another from the very beginning of the ramdisk:
0
+-------------+---------+----------+-----------+--
| file0 | file1 | ...... | filen |
+-------------+---------+----------+-----------+--
\ / \ / \ /
+ size0 + +size1+ + sizen +
To record the names and sizes of various files in the ramdisk, we also need a "file record table".
Nanos-lite's Makefile has already provided a script to maintain this information, first modify nanos-lite/Makefile
as follows:
--- nanos-lite/Makefile
+++ nanos-lite/Makefile
@@ -1,2 +1,2 @@
-#HAS_NAVY = 1
+HAS_NAVY = 1
RAMDISK_FILE = build/ramdisk.img
Then running make ARCH=$ISA-nemu update
will automatically compile the programs in Navy, and integrate all the contents under the navy-apps/fsimg/
directory into the ramdisk image navy-apps/build/ramdisk.img
, while generating the file record table navy-apps/build/ramdisk.h
for this ramdisk image, Nanos-lite's Makefile
will link them to the project through a soft link.
Remember to update the image file
If you have modified the content in Navy, please remember to update the image file through the above command.
The "file record table" is actually an array, each element of which is a structure:
typedef struct {
char *name; // File name
size_t size; // File size
size_t disk_offset; // File offset in ramdisk
} Finfo;
In sfs, these three pieces of information are fixed and unchangeable. The file names are a bit different from our usual habits: Since sfs has no directories, we consider the directory separator /
as part of the file name, for example, /bin/hello
is a complete file name. This approach also implies a hierarchical structure of directories, for a small number of files, this approach is both simple and effective.
With this information, the most basic file read and write operations can already be implemented:
size_t read(const char *filename, void *buf, size_t len);
size_t write(const char *filename, const void *buf, size_t len);
But in real operating systems, this practice of directly using file names as parameters for read and write operations has its flaws. For example, when we use the less
tool to browse files:
cat file | less
The cat
tool wants to write the file content to the standard input of the less
tool, but we cannot use a file name to identify the standard input of the less
tool! In fact, there are indeed many "nameless" files in the operating system. To manage them uniformly, we wish to represent a file by a number, which is the file descriptor. A file descriptor corresponds to an open file, with the operating system maintaining the mapping from file descriptors to specific files. Thus, we naturally use the open()
system call to open a file and return the corresponding file descriptor.
In Nanos-lite, since the number of files in sfs is fixed, we can simply use the index of the file record table as the file descriptor for the corresponding file and return it to the user program. After this, all file operations are identified through file descriptors:
size_t read(int fd, void *buf, size_t len);
size_t write(int fd, const void *buf, size_t len);
int close(int fd);
Additionally, we do not want every read and write operation to start from the beginning. Therefore, we need to introduce an open_offset
attribute for each open file to record the current position of file operations. Each time a certain number of bytes is read or written to a file, the offset advances by that amount.
File Offset and User Programs
In real operating systems, maintaining the offset within the file record table can prevent user programs from implementing certain functionalities. However, explaining this issue requires understanding some knowledge beyond the scope of this course, so we will not elaborate here. You can think about this issue when you study operating systems courses.
Since Nanos-lite is a simplified operating system, the above problem will not occur temporarily, To simplify the implementation, we still maintain the offset in the file record table.
Offsets can be adjusted through the lseek()
system call, allowing for reading and writing at any position within the file:
size_t lseek(int fd, size_t offset, int whence);
To facilitate standard input and output for user programs, the operating system has prepared three default file descriptors:
#define FD_STDIN 0
#define FD_STDOUT 1
#define FD_STDERR 2
They correspond to standard input stdin
, standard output stdout
, and standard error stderr
. The frequently used printf ultimately calls write(FD_STDOUT, buf, len)
for output; while scanf will read input by calling read(FD_STDIN, buf, len)
.
The file_table
defined in nanos-lite/src/fs.c
will include nanos-lite/src/files.h
, which also contains placeholders for three special files: stdin
, stdout
, and stderr
, These are just to ensure that the sfs and the agreed standard input and output file descriptors remain consistent, for example, according to the agreement, the file descriptor for stdout
is 1
, and after we add three placeholder entries, the index 1
in the file record table will not be allocated to other ordinary files.
Based on the above information, we can implement the following file operations in the file system:
int fs_open(const char *pathname, int flags, int mode);
size_t fs_read(int fd, void *buf, size_t len);
size_t fs_write(int fd, const void *buf, size_t len);
size_t fs_lseek(int fd, size_t offset, int whence);
int fs_close(int fd);
These file operations are actually implementations of the corresponding system calls in the kernel. You can consult their functions through man
, for example
man 2 open
where 2
indicates consulting the manual page related to system calls. When implementing these file operations, note the following points:
- Since every file in sfs is fixed and does not generate new files, thus "
fs_open()
not finding the file indicated bypathname
" is an exceptional situation, you need to use assertion to terminate the program. - To simplify implementation, we allow all user programs to read and write all existing files, thus later, we can ignore
flags
andmode
when implementingfs_open()
. - Use
ramdisk_read()
andramdisk_write()
for actual file read and write operations. - Since the size of the files is fixed, when implementing
fs_read()
,fs_write()
, andfs_lseek()
, be careful not to exceed the file boundaries. - Apart from writing to
stdout
andstderr
(output to the serial port usingputch()
), operations on the three special filesstdin
,stdout
, andstderr
can be directly ignored. - Since sfs does not maintain the state of file opening,
fs_close()
can directly return0
, indicating always successful closure.
Finally, you also need to add the corresponding system calls in Nanos-lite and Navy's libos to invoke the respective file operations.
Let the loader use files
Previously, we had the loader directly call ramdisk_read()
to load user programs. After the number of files in the ramdisk increases, this method becomes inappropriate, we first need to let the loader enjoy the convenience of the file system.
You need to first implement fs_open()
, fs_read()
, and fs_close()
, so that you can use filenames in the loader to specify the program to load, such as "/bin/hello".
After implementation, changing user programs later will only require modifying the filename passed to the naive_uload()
function.
Implement a complete file system
Implement fs_write()
and fs_lseek()
, then run the test program navy-apps/tests/file-test
. To compile it, you need to add it to the TESTS
variable in navy-apps/Makefile
, so that it will ultimately be included in the ramdisk image. This test program is used for some simple file read, write, and positioning operations. If your implementation is correct, you will see the program output PASS!!!
.
Remember to update the application list
If you want to add an application to the image, remember to add it to the application list in the above Makefile file.
Support sfs's strace
Due to the characteristics of sfs, opening the same file always returns the same file descriptor. This means that we can directly translate the file descriptors in strace into filenames, obtaining more readable trace information. Try to implement this feature, it can provide some convenience for your future use of strace.
Everything is file
IOE in AM shows us the need for programs to perform input and output. So, on Nanos-lite, how should user programs access devices? One direct method is for the operating system to provide a separate system call for each device, allowing user programs to directly use the corresponding functionalities. However, this approach has several problems:
- First, the types of devices are varied and their functionalities are numerous, making it impractical to implement separate system calls for each to provide interfaces to user programs;
- Moreover, due to the significant differences in device functionalities, if the provided interfaces are not unified, the interaction between programs and devices becomes difficult. Thus, we need a way to abstract the functionalities of devices to provide a unified interface to user programs.
We previously mentioned that the essence of a file is a sequence of bytes. In fact, computer systems are filled with sequences of bytes (how would a computer handle a mere unordered collection of bytes?), and we can easily list many examples:
- Memory is byte-addressed, naturally forming a byte sequence, making our previously used ramdisk as a byte sequence even more apparent
- A pipe (the
|
in shell commands) is a FIFO byte sequence, essentially a queue buffer in memory - A disk can also be seen as a byte sequence: We can number each byte on the disk, for example, the nth byte in the xth cylinder, yth head, zth sector, arranging all the bytes on the disk by number size, we get a byte sequence
- A socket (network socket) is also a byte sequence, it has a buffer responsible for storing received network packets, upper-layer applications view the contents of the socket as a byte sequence and handle them through some special file operations. In PA2, we introduced DiffTest, if you RTFSC, you will find that
qemu-diff
communicates with QEMU through a socket, and the way to operate a socket isfgetc()
andfputc()
- Some information of the operating system can be exposed to users in the form of byte sequences, such as CPU configuration information
- Some special functions provided by the operating system, such as random number generators, can also be seen as an infinitely long byte sequence
- Even some non-storage type hardware can be seen as byte sequences: The sequence of key codes we type on the keyboard forms a byte sequence, the content of each pixel on the display can also be seen as a byte sequence...
Since a file is a byte sequence, naturally, all the varied byte sequences mentioned above can be considered files. Unix operates this way, hence the saying "Everything is a file." This approach's most intuitive benefit is providing a unified interface for different things: We can use the file interface to operate everything on the computer without having to distinguish them in detail, for example, the ramdisk
rule in navy-apps/Makefile
connects the input and output of various shell tools through pipes to generate a file record table
wc -c $(FSIMG_FILES) | grep -v 'total$$' | sed -e 's+ ./fsimg+ +' |
awk -v sum=0 '{print "\x7b\x22" $$2 "\x22\x2c " $$1 "\x2c " sum "\x7d\x2c";sum += $$1}' >> $(RAMDISK_H)
View the contents of a disk in hexadecimal
head -c 512 /dev/sda | hd
Check if the CPU has the Spectre vulnerability
cat /proc/cpuinfo | grep 'spectre'
Even the "Little Star" sample audio we provided in PA2 was created through simple file operations by brute force concatenation
cat Do.ogg Do.ogg So.ogg So.ogg La.ogg La.ogg So.ogg > little-star.ogg
and
#include "/dev/urandom"
will include the contents of the urandom device in the source file: since the urandom device is an infinitely long byte sequence, submitting a program source file containing the above content will cause some Online Judge platforms with weak detection features to crash directly.
The abstraction of "everything is a file" allows us to easily accomplish some tasks on Windows that are difficult to complete, reflecting part of the Unix philosophy: each program uses text files for input and output, making it easy for programs to cooperate. GNU/Linux, inheriting from Unix, naturally carries on this excellent feature. To provide a unified abstraction to user programs, Nanos-lite also attempts to abstract IOE as files.
Virtual File System
To realize the concept of "everything is a file," we need to extend our previously implemented file operations: We need to support not only ordinary file read and write operations but also operations on various "special files." The way to extend this is something you are already familiar with, which is abstraction!
We extend the semantics of the previously implemented file operation APIs to support operations on any file (including "special files"):
int fs_open(const char *pathname, int flags, int mode);
size_t fs_read(int fd, void *buf, size_t len);
size_t fs_write(int fd, const void *buf, size_t len);
size_t fs_lseek(int fd, size_t offset, int whence);
int fs_close(int fd);
This set of extended APIs has a cool name, called VFS (Virtual File System). Since there is a virtual file system, there should correspondingly be "real file systems," which actually refer to how to operate a certain type of file. For example, in Nanos-lite, ordinary files are operated through the ramdisk API; in real operating systems, there are countless types of real file systems: for instance, those familiar with Windows should know about NTFS for managing ordinary files, while EXT4 is currently popular on GNU/Linux; as for special files, there are even more types, hence there are procfs
, tmpfs
, devfs
, sysfs
, initramfs
... These different real file systems each implement the specific operation methods for these files.
Therefore, VFS is actually an abstraction of different types of real file systems, using a set of APIs to describe the abstract behavior of these real file systems, masking the differences between real file systems, upper layers (such as system call handling functions) do not need to worry about the specific type of file being operated on, simply calling this set of APIs can complete the corresponding file operations. With the concept of VFS, adding a real file system becomes very easy: just wrap the access method of the real file system into the VFS API, upper layers do not need to modify any code to support a new real file system.
Here it comes again
When reading the above text, if you thought of the concept of AM, you're right, because the idea behind VFS is also abstraction. In Nanos-lite, the key to implementing VFS is the two read and write function pointers in the Finfo
structure:
typedef struct {
char *name; // File name
size_t size; // File size
size_t disk_offset; // File offset in ramdisk
ReadFn read; // Read function pointer
WriteFn write; // Write function pointer
} Finfo;
ReadFn
and WriteFn
are two types of function pointers, used to point to the actual read and write functions, and return the number of bytes successfully read or written. With these two function pointers, we only need to set different read and write functions for different files in the file record table, then we can call the specific read and write functions through f->read()
and f->write()
.
Simulating Object-Oriented Programming in C
The implementation of VFS demonstrates how to simulate some basic concepts of object-oriented programming in C: for example, using structures to define classes, where ordinary variables in structures can be seen as class members, function pointers can be seen as class methods, setting different functions to function pointers can achieve method overloading...
This shows that those seemingly ethereal concepts in OOP are not much more advanced than C, it's just that the OOP compiler does more for us, and after compiling into machine code, OOP ceases to exist. Object-Oriented Programming With ANSI-C This book specifically introduces how to simulate various concepts and functionalities of OOP in ANSI-C. In the kernel code of GNU/Linux, there are also traces of OOP.
However, in Nanos-lite, since there are very few special files, we have agreed that when the above function pointer is NULL
, it indicates that the corresponding file is a regular file, which should be read and written through the ramdisk's API, thus, we do not need to explicitly specify the ramdisk's read and write functions for most regular files.
We view files as sequences of bytes, most of which are "static", for example, files on ramdisk and disks, if we do not modify them, they will always be in the same place, such byte sequences have the concept of "location"; however, some special byte sequences are not like this, for example, the byte sequence of typing keys is "flowing", once read, it no longer exists, the bytes in such sequences only have a sequential relationship, but cannot be numbered, therefore, they do not have the concept of "location". Files of the former type support the lseek
operation, and the devices that store these files are called "block devices"; while files of the latter type do not support the lseek
operation, and the corresponding devices are called "character devices". Real operating systems also abstract the lseek
operation, but we have simplified it in Nanos-lite and do not implement this abstraction.
The IOE on top of the operating system
With VFS, it becomes very simple to abstract IOE as files.
First, let's look at the simplest output device: the serial port. In Nanos-lite, both stdout
and stderr
output to the serial port. Previously, you might decide whether sys_write()
writes to the serial port based on whether fd
is 1
or 2
. Now with VFS, we no longer need to let the system call handler worry about these special files: we just need to implement serial_write()
in nanos-lite/src/device.c
, and then set the corresponding write function in the file record table to achieve the above function. Since the serial port is a character device, the corresponding byte sequence does not have the concept of "location", thus, the offset
parameter in serial_write()
can be ignored. Additionally, Nanos-lite does not intend to support reading from stdin
, so an appropriate error function can be set in the file record table.
Abstract the serial port as a file
Based on the above content, let VFS support writing to the serial port.
Regarding input devices, let's first look at the clock. The clock is quite special, most operating systems do not abstract it as a file, but directly provide some clock-related system calls for user programs to access. In Nanos-lite, we also provide a SYS_gettimeofday
system call, through which user programs can read the current system time.
Implement gettimeofday
Implement the gettimeofday
system call, please RTFM for the meaning of the parameters of this system call. After implementation, add a timer-test
test in navy-apps/tests/
, in the test, obtain the current time through gettimeofday()
, and output a sentence every 0.5 seconds.
To better encapsulate the functionality of IOE, we provide a multimedia library called NDL (NJU DirectMedia Layer) in Navy. The code for this library is located in navy-apps/libs/libndl/NDL.c
, but most of its functions are not implemented. There are some contents related to NWM_APP
in the code, which you can ignore for now, but do not modify the related code, you will experience its functionality at the end of PA4. NDL provides users with a clock-related API:
// Returns the system time in milliseconds
uint32_t NDL_GetTicks();
Implement NDL's clock
You need to use gettimeofday()
to implement NDL_GetTicks()
, and then modify the timer-test
test, letting it obtain the current time by calling NDL_GetTicks()
. You can add initialization and termination code in NDL_Init()
and NDL_Quit()
as needed, we agree that the program must call NDL_Init()
before using the functions of the NDL library. If you think there is no need to add initialization code, then there is no need to change them.
Another input device is the keyboard, the key information is essentially an event for the system. A simple way is to represent the event in text form, we define the following two events,
- Key down event, such as
kd RETURN
means pressing the Enter key - Key up event, such as
ku A
means releasing theA
key
The names of the keys are the same as those defined in AM, all in uppercase. In addition, an event ends with a newline character \n
.
There are two advantages to using text to describe events, first, text is obviously a sequence of bytes, which makes it easy to abstract events into files; furthermore, the textual method makes it easy for user programs to parse the content of events. Nanos-lite and Navy agree, the above events are abstracted into a special file /dev/events
, it needs to support read operations, user programs can read key events from it, but it does not need to support lseek
, because it is a character device.
NDL provides users with an API related to key events:
// Reads an event information and writes it into `buf`, up to `len` bytes
// If a valid event is read, the function returns 1, otherwise it returns 0
int NDL_PollEvent(char *buf, int len);
Abstract key input as a file
You need to:
- Implement
events_read()
(defined innanos-lite/src/device.c
), write the event intobuf
, up tolen
bytes, and then return the actual length written. The key names are already defined in the string arraynames
, you need to use the IOE's API to get the device input. Additionally, if there are no valid keys currently, return 0. - Add support for
/dev/events
in VFS. - Implement
NDL_PollEvent()
in NDL, reading events from/dev/events
and writing them intobuf
.
We can assume that at most one event will be read at a time, which can simplify your implementation. After implementation, run navy-apps/tests/event-test
on Nanos-lite, if implemented correctly, striking a key will output the key event information.
Use fopen() or open()?
This is a very worthwhile question to consider. You need to think about the differences in the specific behaviors of these two sets of APIs, and then analyze which function should be used to operate on the special file /dev/events
. We have already provided some hints in a blue information box.
Finally, for VGA, to update the screen, the program only needs to write pixel information into VGA's video memory. Thus, what Nanos-lite needs to do is to abstract the video memory into a file. The video memory itself is also a storage space, storing the pixels to be displayed on the screen in a row-major order. Nanos-lite and Navy agree to abstract the video memory into the file /dev/fb
(fb stands for frame buffer), it needs to support write operations and lseek
, in order to update the pixels to a specified position on the screen.
NDL provides users with two APIs related to screen drawing:
// Opens a canvas of size (*w) X (*h)
// If both *w and *h are 0, then the entire system screen is used as the canvas, and *w and *h are set to the size of the system screen
void NDL_OpenCanvas(int *w, int *h);
// Draws a rectangle image of size `w*h` at the canvas coordinates `(x, y)`, and synchronizes the drawn area to the screen
// The image pixels are stored in `pixels` in a row-major order, each pixel described by a 32-bit integer in `00RRGGBB` format
void NDL_DrawRect(uint32_t *pixels, int x, int y, int w, int h);
Here, "canvas" is a concept oriented towards the program, and the coordinates for drawing in the program are set relative to the canvas, thus the program does not need to worry about the size of the system screen or where on the system screen the image needs to be drawn. NDL can decide where to "paste" the canvas based on the size of the system screen and the size of the canvas, for example, pasting it in the top left corner or centering it, thus writing the content of the canvas to the correct position in the frame buffer.
The functionality of NDL_DrawRect()
is very similar to the drawing interface introduced in PA2. However, to implement it, NDL also needs to know the size of the screen. Nanos-lite and Navy agree that the information about the screen size is obtained through the /proc/dispinfo
file, which needs to support read operations. The format of this file's content is specified in navy-apps/README.md
, which you need to read. As for the specific screen size, you need to obtain it through the corresponding API of IOE.
Retrieve screen size in NDL
- Implement
dispinfo_read()
(defined innanos-lite/src/device.c
), write the file'slen
bytes intobuf
(we assume this file does not supportlseek
, sooffset
can be ignored). - Read the content of this file in NDL, parse the screen size from it, and then implement the functionality of
NDL_OpenCanvas()
. Currently,NDL_OpenCanvas()
only needs to record the size of the canvas, of course, we require that the canvas size does not exceed the screen size.
Run Nanos-lite with navy-apps/tests/bmp-test
, since the drawing functionality is not yet implemented, you cannot output image content, but you can first use printf()
to output the parsed screen size.
Abstract VGA video memory as a file
- Initialize the size of
/dev/fb
in the file record table ininit_fs()
(defined innanos-lite/src/fs.c
). - Implement
fb_write()
(defined innanos-lite/src/device.c
), to writelen
bytes frombuf
to the screen atoffset
. You need to calculate the screen coordinates fromoffset
first, then call IOE to perform the drawing. Additionally, we agree that after each drawing, the content in the frame buffer is always immediately synchronized to the screen. - Implement
NDL_DrawRect()
in NDL, by writing pixel information to the correct position in/dev/fb
to draw the image. You need to clearly understand the positional relationship between the system screen (i.e., frame buffer), the canvas opened byNDL_OpenCanvas()
, and the drawing area indicated byNDL_DrawRect()
.
Run Nanos-lite with navy-apps/tests/bmp-test
, if implemented correctly, you will see the Project-N logo displayed on the screen.
Implement a centered canvas
Based on the size of the screen and the canvas, you can have NDL draw the image in the center of the screen, thus achieving a better visual effect.