Part I. C Case Studies
This part is divided into three chapters.
-
In the first chapter we solve a MINIX kernel exercise.
-
In the next chapter we are going to show how to create entries to Linux /proc virtual file system from an our own kernel module.
-
Finally, in the last chapter, we present a parallel network server that is based on I/O multiplexing and use shared memory as IPC.
Chapter 2. MINIX kernel hacking: analysing microkernel's IPC
In this chapter we have made a lifelong friend with a beautifully simple microkernel called MINIX. We solve an exercise of Tanenbaum and Woodhull's operating systems book,
-
first with the modification of a part of PCB (Process Control Block) defined in proc.h,
-
then with the creation of a new system call.
First of all we setup a MINIX system and we show how to compile the MINIX kernel.
„Re 2: your job is being a professor and researcher: That's one hell of a good excuse for some of the brain-damages of minix.”
—Linus Benedict Torvalds Linus-Tanenbaum vita
1. Introduction: the MINIX which acted as the detonator of open source
Eric Steven Raymond (esr) changed the term free software to open source in The Cathedral and the Bazaar in 1998. More than ten years back, the MINIX had been released in 1987, so obviously it was not open source.
GNU GPL
It may be noted that the first version of GNU GPL was released in 1989 and the first Linux had already been licensed under GNU GPL version 2, released in 1991.
At the beginning of the informatics profession, the C source codes of UNIX systems could have been used in university courses, but it lasted only a short period of time due to UNIXs had become a very valuable product quickly and then it was no longer possible to teach the operating systems in its C source form. The MINIX was born in this environment. The purpose of it was specifically to allow the usage of the source code of a UNIX-like operating system in education again.
From this aspect, the MINIX can be regarded as a pioneer of open source operating systems and, ultimately, we can say that the open source was brought into existence by the closing of the source code [OS3].
1.1. The MINIX microkernel and the MINIX IPC
The IPC (Inter-Process Communication) determines a communication method in which programs can exchange data with each other. There are several IPC mechanisms, from pipes through anonymous, local and network sockets to using shared memory (from these methods semaphore arrays and the shared memory will be used in the third case studies). The MINIX IPC is based on the message passing. The massage passing takes place between two appropriately synchronised (by the rendezvous principle) processes, so we would not buffer the messages [OS], [OS3], as shown in the following figures.
Figure 2.1. SEND and RECEIVE message passing primitives.
Figure 2.2. The sender is blocked while the receiver is not ready to receive.
Figure 2.3. The receiver is blocked while the message is not being received.
2. The installation of MINIX3 system
There is a great experience for speed lovers to use the MINIX installed on a primary partition, but it is simpler if the first encounter with MINIX takes place on a virtualized machine. So we are going to install MINIX3 in VirtualBox. For example, let's choose the latest development snapshot image minix3_2_1_ide_20131118.iso, that should be written to a CD disk, but it is a better way to mount this ISO image directly.
2.1. Installation in VirtualBox
Figure 2.4. Giving the name of the MINIX system that will be installed.
Creating a virtual machine in VirtualBox is a very simple, but necessary subtask. After some clicks with using the default settings the new virtual MINIX system will be launched soon.
Figure 2.5. Launching the virtualized MINIX.
At first startup, we must give from where we want to boot. All we have to do is to choose the downloaded image file.
Figure 2.6. Starting the installation.
From this point we have to give some settings that must be applied mutatis mutandis.
Figure 2.7. Loading the system from disk.
We must follow the instructions which appeared on the screen: the installation will start after we have logged in as root and entered setup.
Figure 2.8. We accept defaults, mutatis mutandis.
During the next installation steps we may accept the defaults by pressing Enter, mutatis mutandis. Then (after unmounting the installer disk image in the menu item Devices/CD/DVD Device) we are able to boot the new MINIX system from the hard disk.
Figure 2.9. MINIX starts from hard disk now.
In the next section we are going to take possession of the newly installed MINIX.
2.2. The first MINIX kernel hacking
The first kernel compilation is an obligatory task for all students. Let's do it together now!
Using the command cd /usr/src we enter the directory that contains the MINIX kernel-tree. Then we open the source file kernel/main.c with vi editor. The usage of vi is not masochism, because the installed MINIX has no other editors now.
Figure 2.10. Opening the source file kernel/main.c with vi.
In this source file, search for the function announce().
/*===========================================================================*
* announce *
*===========================================================================*/
static void announce(void) { /* Display the MINIX startup banner. */
printf("\nMINIX %s.%s. " #ifdef _VCS_REVISION "(" _VCS_REVISION
")\n" #endif "Copyright 2012, Vrije Universiteit, Amsterdam, The
Netherlands\n", OS_RELEASE, OS_VERSION); printf("MINIX is open
source software, see http://www.minix3.org\n"); printf("nORBi a
kernelben\n"); printf("%c%s",0x1B,"[47;30m"); }
Figure 2.11. The modification of the function announce() in the kernel/main.c source file.
All we did was to write a message during booting. The Hungarian language message nORBi a kernelben roughly means „the author is in the kernel” that is similar to a „Hello, World!„ message from the kernel. This message will be printed in boot time. In addition, the video mode of the terminal is changed to gray background and black foreground. The description of the used ANSI escape codes, for example, can be found in Wikipedia.
After saving the source (using Shift+zz key combination in vi) the make install command compiles the kernel sources, then the system must be rebooted by the command shutdown -r.
Figure 2.12. The booting of the newly compiled kernel.
In the case of the newly compiled kernel we can see the message from the kernel and the changed video mode as well.
Figure 2.13. The message from the kernel.
Installing the packages
Handling packages and the packages themselves are always changing... At first in Minix3, we could use packman package manager program and for example the emacs editor could be installed from package. At this moment MINIX's package manager is pkgin and emacs cannot be installed from package.
Before listing the available packages we must enter the command pkgin update, after this we are able to choose from packages using command pkgin av|more.
Figure 2.14. The first step of managing packages.
Customize Your Minix!
For example, I like to set the bash's PS1 environment variable to show the host name, the user name and the working directory. I usually add the following lines to the bottom of the .profile file.
export PS1="\[\033[47;30m\][minix@\u]\w > " clear bash
exit
If we have already set the video mode in kernel, we do the same in the shell too. Here we may notice that using .bashrc to set PS1 is a more robust solution than using .profile
export PS1="\[\033[47;30m\][minix@\u]\w
> " clear
because in this case the make world will not delete this file.
3. Analysis of MINIX3 IPC
The examples of this section are the solutions of exercise 44 on page 219 of book [MINIX3_OS_BOOK]. The exact wording of the exercise, cited from the book [MINIX3_OS_BOOK], is the following: „Add code to the MINIX 3 kernel to keep track of the number of messages sent from process (or task) i to process (or task) j. Print this matrix when the F4 key is hit.”
In the author's Operating System courses between 2008 and 2010, the solutions of this task were used as typical examples to illustrate the Minix kernel programming. At that time, the Minix kernel was 3.1, and our students were helped to solve this exercise by the following notes: DEIK_MIPPOS_2008tavasz_BN_KiemeltOttoni_OR168_38.pdf. But remember, Minix is a living system, its code is changing rapidly, and therefore we may find several points, in this linked document, where the source code has already changed. This is a common phenomenon in informatics. For example, being restricted to Minix only, in the Minix book we can read about managing penalty points in sched() and we can see it in source code in the appendix of the book, but it is not included in the current version of the Minix kernel.
3.1. A solution with modifying the PCB
In this section we are working with MINIX 3.2.1. At the time of the writing of this book, it is the current version and it has just been installed in the section „The installation of MINIX3 system”
The PCB is an abstraction of the processes of the operating system, which is typically realised as a C structure. In Minix this structure is the struct proc that is defined in kernel/proc.h source file. The process table in Minix is simply a structure array of NR_TASKS + NR_PROCS PCB elements.
Acquiring routine
If you have no experience in Minix kernel programming we suggest that you should suspend the elaboration of this example and first start by solving the following exercise.
300 processes
When you log into the machine a special process called the shell will be created. If you enter a command in the shell, a (child) process will be created to execute the command. Web servers may fork processes to handle client requests and the list goes on. Many situations can be listed in which processes play significant roles. Actually, it is not suprising because processes are abstractions of computer programs. But it is obvious that the number of processes is limited by the size of the process table array anyway. Therefore, it is a legitimate exercise to increase the size of the process table: set the number of possible processes to 300 (from 256).
We may set the value of NR_PROCS in the source file include/minix/sys_config.h
-
Set the value of NR_PROCS to 300 from 256.
Figure 2.15.
-
After setting the NR_PROCS, it will be interesting to see the size of the process table. It can be done easily by following the previous exercise in Section The first MINIX kernel hacking.
Figure 2.16. Printing out the size of the PCB and the size of the process table.
Example 2.1. Number of processes
We have also done this exercise ourselves.
After the above two steps we can see in the next figure that the size of the PCB is equal to 1664 bytes and the size of the process table is 507520 (=1664*(5+300)). If the reader completes this exercise by himself/herself, he or she will see that the result will be different due to the fact that we have already used a modified version of PCB. Because our PCB has already contains a vector in which the number of elements is dependent on the value of NR_PROCS too.
Figure 2.17. Read the size of the PCB and the size of the process table.
Example 2.2. The size of the PCB
Print the size of the PCB and the size of the process table from kernel.
This exercise was solved as an implicit part of the previous exercise.
Returning to the original exercise, the MINIX PCB is being extended with an array of integers (called in Hungrian uzenetszam that means the number of messages). The i-th element of this array says how many messages were sent to the i-th process.
Figure 2.18. Extending the MINIC PCB.
In principle, we know where to store the number of sent messages, now we are going to program the counting procedure itself.
++(caller_ptr->uzenetszam[dst_ptr->p_nr + NR_TASKS]);
The appropriate element of the uzenetszam array in the caller's PCB is being incremented. Here, it is necessary to add + NR_TASKS to the array index because the numbering of processes starts from -NR_TASKS.
To avoid complication (in the first approach), this increment statement has been placed into the source file kernel/proc.c, into the begining of the function mini_send.
Figure 2.19. Counting massages.
We have already mentioned that the numbering of processes starts from -NR_TASKS. It can be seen not only in the kernel sources but also if you press F1 key:
Figure 2.20. The kernel process table.
In GNU/Linux systems, the /proc virtual file system serves for monitoring the running kernel. Minix kernel has no such tool like this. We can obtain information about the running kernel over the Information Server (is), which is located in the server part of the Minix microkernel architecture. If we are going to debug the running kernel (in our case to print out the added PCB vectors), we need to use Information Server. We can link debug functions to function keys in the source file servers/is/dmp.c, right now to the F4.
Figure 2.21. Connecting a debug function to a function key.
The debug function uzenetszam_dmp first prints the process names of the non-empty slots of the process table as a header line. We may have noticed that we are in the server layer of the kernel tree to be more precise we have worked in the is server, where to the process table is copied from the bottom layer of the kernel by the system call sys_getproctab. The uzenetszam_dmp function works this copied instance of the process table, because the bottom layer of the kernel cannot be seen from the layer of servers.
Figure 2.22. The beginning of the function uzenetszam_dmp.
After the header line, the next lines will contain the vectors of non-empty PCBs.
Figure 2.23. The midst of the function uzenetszam_dmp.
At the end of printing the matrix, the mennyitol (in English it means „from number”) variable will be incremented by 21, it implements a kind of paging for displaying the „IPC matrix”. To be more precise, if the F4 key has been pressed, a submatrix of size 22x9 will be printed from the actual value of mennyitol.
Figure 2.24. The end of the function uzenetszam_dmp.
Since the function uzenetszam_dmp is also used outside of the source file servers/is/dmp_kernel.c, we must declare it, therefore we give its prototype in the source file servers/is/proto.h.
Figure 2.25. The prototype of the function uzenetszam_dmp.
After this, all we have to do is to compile the kernel, then to boot the new kernel. And after the login, if we press F4 the following debug screen appears:
Figure 2.26. Displaying the IPC matrix.
Example 2.3. Non-empty slots
Print out the non-empty slots of the kernel process table in the form of [process name]-[p_quantum_size_ms]-[p_cpu_time_left] when F1 key is pressed (that is using the is Information Server). By solving this exercise, you must know the struct proc structure members p_quantum_size_ms and p_cpu_time_left defined in the kernel part of Minix PCB in the source file kernel/proc.h.
Figure 2.27. List of the non-empty slots from the process table.
3.2. Another solution by introducing a new system call
In the previous section, we modified the PCB. Then (using the sys_getproctab system call) we copied the kernel process table to the layer of server processes. Now, outside the PCB, but inside the kernel we create a data structure: a matrix of integers to store the „IPC matrix”. To access this matrix, we create a new system call.
Define a 2 dimensional array of size (NR_TASKS + NR_PROCS)*(NR_TASKS + NR_PROCS) outside the PCB structure
Figure 2.28. An array to store the „IPC matrix”.
Counting of IPC messages will be done in the same manner as described in the previous section. But now, the uzenetszam array will be accessed directly and not through the PCB.
Figure 2.29. Incrementing of the appropriate element of the uzenetszam array.
The uzenetszam array of the kernel is also defined with the same name in the server layer.
Figure 2.30. This is the uzenetszam that we defined in the server layer.
Use the sys_getmatrix system call that will be newly developed for copying the IPC matrix from kernel level to the server level.
Figure 2.31. Usage of the sys_getmatrix system call that will be newly developed.
Printing the matrix has changed very little only in one line:
printf("%7d|", uzenetszam[rp->p_nr +
NR_TASKS][i]);
Figure 2.32. Printing the matrix.
In the source file include/minix/syslib.h, write the sys_getmatrix macro. This macro will call the system call do_getinfo that will do the real work. The sys_getmatrix macro helps simplify using the do_getinfo.
Figure 2.33. Defining the sys_getmatrix macro.
The GET_MATRIX constant (used in the sys_getmatrix macro) is defined in source include/minix/com.h, its value is chosen to be equal to 255.
Figure 2.34. The GET_MATRIX macro.
There is not much for us to do to add a case branch to the switch of do_getinfo in order to handle the calls of the sys_getmatrix.
Figure 2.35. Supplying the do_getinfo system call.
Now we are ready to recompile and boot the new kernel. Then, after pressing F4, we can see the fruits of our labour.
Figure 2.36. Displaying the IPC matrix in the second solution.
It is interesting to compare the two solutions. For example, what happens if processes are created then destroyed after they have been finished and so on.
The availability of this case study.
The two presented solutions are available in a VirtualBox virtual machine that can be found at http://www.inf.unideb.hu/~nbatfai/MINIX3.2.ova
Chapter 3. GNU/Linux kernel hacking: making entries in the /proc virtual file system
In this chapter we have made a lifelong friend with a beautifully sophisticated monolithic kernel called Linux.
-
First, we are going to configure and compile a custom Linux kernel.
-
Then, we will write a kernel module that will create and maintain a simple „debugging file” under /proc virtual file system.
„LINUX is obsolete”
—Andy Tanenbaum (ast) Linus-Tanenbaum vita
1. The monolithic kernel of Linux
„.x.x: Linus went crazy, broke absolutely _everything_, and rewrote the kernel to be a microkernel using a special message-passing version of Visual Basic.”
—Linus Torvalds RFD: Kernel release numbering (Linux Kernel Mailing List)
In the previous case study, we had some experience with the operation of the microkernel, where architectural components are organized in different hierarchical levels. This section drives us towards the other end of the spectrum, to the world of monolithic kernels. What do monolithic and micro kernels mean in practice? Think about the previous MINIX case study, where the data structures of the lower levels of the kernel could not be accessed from the higher levels. For example, from the Minix Information Server, we could not access the vector added to PCB, we needed to copy it up to the server level. In a monolithic kernel, the data structures and functions of the kernel can be accessed from anywhere in the kernel code.
In the known history of mankind, there have been several examples where huge masses of people were cooperating to achieve some mutual goals or were involved in achieving some goals. Some examples may be the building of the pyramids, the religious communities and the large mass armies of the 19th century.
If we investigate only pure intellectual purposes, we will find that we are not an embarrassment of riches. A present day example may be the building of the LHC (Large Hadron Collider), that was built by 10.000 technicians, engineers and scientist from all over the world. This may be comparable to writing the Linux kernel that is being maintained by more than 8000 developers [KERNELDEV]. And if we focus not only on the kernel itself but the whole GNU/Linux system (including editors, browsers, games, etc.), then we will get a much greater number than 10000 volunteers.
1.1. Linux kernel compiling
We believe that compiling the kernel should be a fundamental experience for all programming students. If the reader has never done it, we will do it now, in this section.
Compiling the kernel is a relatively safe activity, but it is better to do it in a virtualized Linux system. So if the reader has never done it before, we will compile it now, using the virtualized image Batfai_Prog1.ova.
The Linux kernel is a large C program, in order to compile it we need to download its source. The actual kernel sources can be downloaded from kernel.org. Here, at the time of writing, the 3.5 is the actual version. In the beginning, a simple version numbering scheme was used: the unstable versions were numbered by odd numbers and the stable ones by even numbers. But it was too „mathematical”, now a more sophisticated model was used.
Kernel version numbering
But joking apart, the rethinking of the numbering concept was suggested by Linus Torvalds in his email subjected „RFD: Kernel release numbering”.
The 2.6 kernels had already been numbered by the next schema. A stable major.minor.revision is followed by a sequence of major.minor.revision.patchlevel stable versions that contain bugfixes and security fixes. In parallel with these versions, the major.minor.revision+1"-rc-patchlevel" is the development (unstable) version, where new things are introduced. The next stable version major.minor.revision+1 will be evolved from the development versions.
On the Linux's 20th birthday, the version numbering was bumped from 2.6 to 3.0. Now, the revision shows the fixes, the development versions have the rc prefix.
We have chosen the current version, that is 3.5. The whole source tree is roughly 70-80 MB (it is not sufficient to download the patch that is only some kilobytes of large.) The unpacked size will be roughly a half gigabyte, but it should be noted that the size of the compiled kernel tree may be larger than 5 GB.
Using the command wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.5.tar.bz2, here we are downloading the kernel sources.
Figure 3.1. Downloading the kernel sources.
The root of the unpacked kernel tree contains a README file, we follow the procedure descripted in this. First, unpack the sources with command tar xvjf linux-3.5.tar.bz2 or cited from the README, with the bzip2 -dc linux-3.5.tar.bz2 | tar xvf - command.
The compiling is controlled by the .config file of the kernel tree that file also can be found in the running system's /boot folder. We will use the command make menuconfig in order to customize the kernel. The make menuconfig works in the terminal screen and its output is the file .config. If you have already compiled the kernel, use the command make mrproper. This program will clean up the config and the object files from the source tree. As a best practice, the reader can copy the existing file from the /boot directory into the root of the source tree, then uses the command make oldconfig, because in this case it is enough to make decisions about the new features only.
Figure 3.2. Making the .config with command make menuconfig.
Reading all kernel options in make menuconfig is a whole day's work, here we can see only one: under the General setup menu, we are going to set the Kernel .config support option that will save the .config file into the running kernel itself.
Figure 3.3. Setting the option Kernel .config support.
Linux kernel compiling may be coming now. Use the make command to compile the sources (here we do not follow the README, the argument O=output/dir is simply omitted, because we do not want to use a separate build directory).
Figure 3.4. Starting the compiling.
The compilation has already finished, now we sudo into root and give the command make modeules_install install. With this command, we install the modules and the kernel.
Figure 3.5. Starting the installation.
By using the command uname -a, print the version of the running kernel.
Figure 3.6. The version of the running kernel.
We should now reboot the computer. After this, it can be seen that the grub menu has also been updated properly.
Figure 3.7. The updated GRUB menu.
Now, the compiled custom kernel controls our computer.
Figure 3.8. The new kernel.
1.2. Kernel modules
In the previous section, you can see that options marked by a star (*) will be compiled into the kernel. If we mark an option with M, it will be compiled into the kernel as a module.
At creating the configuration, we also have opportunity to compile a pure monolithic kernel without module support.
Figure 3.9. Switching off the option Enable loadable module support.
Without module support, the size of the kernel image is much larger than the original one. Let's also see it after creating and booting a new pure monolithic kernel:
[norbert@BatfaiProg1 ~]$ ls -l /boot/vmlinuz-`uname
-r` -rw-r--r--. 1 root root 25532624 Jul 31 15:50
/boot/vmlinuz-3.5.0
Figure 3.10. The size of the kernel image.
2. Making entries in the /proc virtual file system
As a first step we present we are going to the whole solution of the exercise, then we will „celebrate the source code”.
2.1. Creating the module in VirtualBox.
The solution is presented using the virtual machine Batfai_Prog1.ova.
Exercise: making entries in the /proc
Write a kernel module that prints debug information about processes into a file under /proc.
This example already appeared in the original Paternoster [PP] (pp. 177). Further background information can be found in the books [LDD] and [KMGUIDE] and especially in the following articles
-
Randy Dunlap: Linux kernel seq_file HOWTO
-
Driver porting: The seq_file interface
-
Now we have made the module. Its source code called sbejegyzes.c can be found in the PROP_peldak/sajat_bejegyzes_modul directory of the virtualized system.
#include
#include #include
#include #include
#include #include
#include
MODULE_DESCRIPTION ("Ez a sajat bejegyzes (PROP konyv) kernel
modul"); MODULE_AUTHOR ("Bátfai Norbert (nbatfai@gmail.com)");
MODULE_LICENSE ("GPL"); static int taszk_lista (struct
seq_file *m, void *v) { int i = 0; struct task_struct *task;
struct list_head *p; // a kernel/sched/core.c-ben latott modon
iratjuk ki // a betut (szemben a fs/proc/array.c-ban
lathatoval) // stat_nam[task->state ? __ffs(task->state)
+ 1 : 0] static const char stat_nam[] =
TASK_STATE_TO_CHAR_STR; seq_puts (m, "sajat taszk lista
(negyedik kernel modul, PROP konyv)\n"); seq_printf (m, "%-9s
%-16s %-6s %-12s %-6s %-3s\n", "#", "CMD", "PID", "FLAGS",
"ST1", "ST2"); list_for_each (p, current->tasks.next) {
task = list_entry (p, struct task_struct, tasks); seq_printf
(m, "%-9i %-16s %-6i %-12u %-6li %-3c\n", ++i, task->comm,
task->pid, task->flags, task->state,
stat_nam[task->state ? __ffs (task->state) + 1 : 0]); }
return 0; } static int sajat_open (struct inode *inode, struct
file *file) { return single_open (file, taszk_lista, NULL); }
static struct file_operations sajat_fajl_muveletek = { .owner
= THIS_MODULE, .open = sajat_open, .read = seq_read, .llseek =
seq_lseek, .release = single_release }; static struct
proc_dir_entry *sajat_proc; static int sbejegyzes_init_module
(void) { struct proc_dir_entry *sajat_proc_fajl; if
(!(sajat_proc = proc_mkdir ("sajat", NULL))) {
remove_proc_entry ("sajat", NULL); printk (KERN_NOTICE
"/proc/sajat/ letrehozas sikertelen\n"); return -1; } printk
(KERN_NOTICE "/proc/sajat/ letrehozva\n"); if
((sajat_proc_fajl = create_proc_entry ("taszk_stat", S_IFREG |
S_IRUGO, sajat_proc))) { sajat_proc_fajl->proc_fops =
&sajat_fajl_muveletek; printk (KERN_NOTICE
"/proc/sajat/taszk_stat letrehozva\n"); return 0; } else {
remove_proc_entry ("taszk_stat", sajat_proc);
remove_proc_entry ("sajat", NULL); printk (KERN_NOTICE
"/proc/sajat/taszk_stat letrehozas sikertelen\n"); return -1;
} } static void sbejegyzes_exit_module (void) {
remove_proc_entry ("taszk_stat", sajat_proc); printk
(KERN_NOTICE "/proc/sajat/taszk_stat torolve\n");
remove_proc_entry ("sajat", NULL); printk (KERN_NOTICE
"/proc/sajat torolve\n"); } module_init
(sbejegyzes_init_module); module_exit
(sbejegyzes_exit_module);
-
The module has been created into the kernel object file sbejegyzes.ko using the command make.
[norbert@matrica sajat_bejegyzes__modul]$ make make -C
/lib/modules/`uname -r`/build M=`pwd` modules make[1]:
Entering directory `/usr/src/kernels/2.6.42.12-1.fc15.x86_64'
CC [M]
/home/norbert/kernelfa/sajat_bejegyzes__modul/sbejegyzes.o
Building modules, stage 2. MODPOST 1 modules CC
/home/norbert/kernelfa/sajat_bejegyzes__modul/sbejegyzes.mod.o
LD [M]
/home/norbert/kernelfa/sajat_bejegyzes__modul/sbejegyzes.ko
make[1]: Leaving directory
`/usr/src/kernels/2.6.42.12-1.fc15.x86_64'
We assume that the Makefile is in the same directory as sbejegyzes.c. The reader can be familiar with Makefiles for kernel modules from the Documentation/kbuild/modules.txt of the kernel source tree.
obj-m += sbejegyzes.o all: make -C /lib/modules/`uname
-r`/build M=`pwd` modules clean: make -C /lib/modules/`uname
-r`/build M=`pwd` clean rm *~
-
The created module can be loaded as root using the command insmod sbejegyzes.ko
[root@matrica sajat_bejegyzes__modul]# insmod
sbejegyzes.ko
-
We can see your entries under the /proc/sajat. Let's try it. Enter the command more /proc/sajat/taszk_stat
[root@matrica sajat_bejegyzes__modul]# more
/proc/sajat/taszk_stat sajat taszk lista (negyedik kernel
modul, PROP konyv) # CMD PID FLAGS ST1 ST2 1 systemd 1 4202752
1 S 2 kthreadd 2 2149613632 1 S 3 ksoftirqd/0 3 2216722496 1 S
4 migration/0 6 2216722496 1 S 5 watchdog/0 7 2216722752 1 S 6
migration/1 8 2216722496 1 S 7 ksoftirqd/1 10 2216722496 1 S 8
watchdog/1 12 2216722752 1 S 9 migration/2 13 2216722496 1 S
10 ksoftirqd/2 15 2216722496 1 S 11 watchdog/2 16 2216722752 1
S 12 migration/3 17 2216722496 1 S 13 ksoftirqd/3 19
2216722496 1 S 14 watchdog/3 20 2216722752 1 S 15 cpuset 21
2216722496 1 S 16 khelper 22 2216722496 1 S 17 kdevtmpfs 23
2149613888 1 S 18 netns 24 2216722496 1 S 19 sync_supers 25
2149613632 1 S 20 bdi-default 26 2157969472 1 S 21 kintegrityd
27 2216722496 1 S 22 kblockd 28 2216722496 1 S 23 ata_sff 29
2216722496 1 S 24 khubd 30 2149580864 1 S ...
Example 3.1. The contents of the PCB
The Linux PCB is defined in the struct task_struct structure in the include/linux/sched.h source file. Investigate the elements of the PCB and add further columns to /proc/sajat/taszk_stat.
2.1.1. „Celebrating” the source code
The essential part of the work of the module is done by taszk_lista function, in which the variable i counts the processes (PCBs) in the system and a struct task_struct * pointer task is used to iterate through the processes. To be more precise a struct list_head * pointer p is used because the next and back links of the circular doubly linked list of PCBs are organised around of the structure struct list_head as shown in the next figure.
Figure 3.11. The circular doubly linked list of PCBs in Linux.
Handling linked list in the kernel
The macros in linux/list.h can be used to manage the linked list of PCBs. The link member of the PCB list is a list_head structure member that is defined in linux/types.h.
struct list_head { struct list_head
*next, *prev; };
the usage of these link members were shown in the previous figure. These elements are part of the PCB as it can be seen in the next code snippet cited from the code of PCB from linux/sched.h.
struct task_struct { volatile long
state; /* -1 unrunnable, 0 runnable, >0 stopped */ ... struct
list_head tasks;
We have used the
list_for_each(pos,
head)
macro defined in linux/list.h:
/** * list_for_each -
iterate over a list * @pos: the &struct list_head to use as a
loop cursor. * @head: the head for your list. */ #define
list_for_each(pos, head) \ for (pos = (head)->next; pos !=
(head); pos = pos->next)
The
list_for_each(pos, head)
iterates a struct list_head * pointer pos that is passed as the first parameter through the list from the element head that is passed as the second parameter.
The current macro
It should be noted that in the code snippet
list_for_each (p,
current->tasks.next) {
we have used the current macro that points to the process that is being executed. What is the current macro exactly? Its definition can be found in asm-generic/current.h
#define get_current()
(current_thread_info()->task) #define current get_current()
The current_thread_info function is defined in (arch/x86/include/)asm/thread_info.h.
#ifndef __ASSEMBLY__ /* how to get the
current stack pointer from C */ register unsigned long
current_stack_pointer asm("esp") __used; /* how to get the
thread information struct from C */ static inline struct
thread_info *current_thread_info(void) { return (struct
thread_info *) (current_stack_pointer & ~(THREAD_SIZE - 1));
} #else /* !__ASSEMBLY__ */ /* how to get the thread information
struct from ASM */ #define GET_THREAD_INFO(reg) \ movl
$-THREAD_SIZE, reg; \ andl %esp, reg
in the kernel, the thread_union union is on the bottom of the memory area of a process. If we filled the entire stack, then the stack pointer would point to the bottom of the stack and the memory area of thread_union would be overwritten. This union is 8 KB, so we can easily calculate the address of struct thread_info of the running process. The stack pointer's low 25 bits must be set to zero. It can be seen in the following code snippet from the (arch/x86/include/)asm/thread_info.h: current_stack_pointer & ~(THREAD_SIZE - 1), where the value of THREAD_SIZE is equal 8 KB
#define THREAD_SIZE 8192
//(2*PAGE_SIZE)
as can be seen in the file arch/xtensa/include/)asm/thread_info.h. The address of the thread_union and the address of the thread_info are the same:
union thread_union { struct
thread_info thread_info; unsigned long
stack[THREAD_SIZE/sizeof(long)]; };
and the task member of the struct thread_info points to PCB of the process.
struct thread_info { struct task_struct
*task; /* main task structure */
See also the exercise at the end of this section.
Returning to the code snippet of the module to be developed, the list_for_each macro iterates through the list of PCBs, and accesses the PCBs using the
list_entry(ptr, type, member)
macro that defined in the linux/list.h header.
/** * list_entry - get the
struct for this entry * @ptr: the &struct list_head pointer. *
@type: the type of the struct this is embedded in. * @member: the
name of the list_struct within the struct. */
The printed debug information also includes a status character indicating the state of the given process. These possible characters are collected in the following macro that is defined in linux/sched.h.
#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKW"
static int taszk_lista (struct
seq_file *m, void *v) { int i = 0; struct task_struct *task;
struct list_head *p; // a kernel/sched/core.c-ben latott modon
iratjuk ki // a betut (szemben a fs/proc/array.c-ban lathatoval)
// stat_nam[task->state ? __ffs(task->state) + 1 : 0] static
const char stat_nam[] = TASK_STATE_TO_CHAR_STR; seq_puts (m,
"sajat taszk lista (negyedik kernel modul, PROP konyv)\n");
seq_printf (m, "%-9s %-16s %-6s %-12s %-6s %-3s\n", "#", "CMD",
"PID", "FLAGS", "ST1", "ST2"); list_for_each (p,
current->tasks.next) { task = list_entry (p, struct
task_struct, tasks); seq_printf (m, "%-9i %-16s %-6i %-12u %-6li
%-3c\n", ++i, task->comm, task->pid, task->flags,
task->state, stat_nam[task->state ? __ffs (task->state) +
1 : 0]); } return 0; }
Making entries under the /proc is done by using the next two functions defined in linux/proc_fs.h
extern struct proc_dir_entry
*create_proc_entry(const char *name, umode_t mode, struct
proc_dir_entry *parent); extern struct proc_dir_entry
*proc_mkdir(const char *,struct proc_dir_entry
*);
of which the first is the more interesting one
if ((sajat_proc_fajl =
create_proc_entry ("taszk_stat", S_IFREG | S_IRUGO,
sajat_proc)))
For example, according the linux/stat.h, the S_IRUGO
#define
S_IRUGO (S_IRUSR|S_IRGRP|S_IROTH)
gives read permission to user (USR), group (GRP) and other (OTH).
# ls -l /proc/sajat/taszk_stat -r--r--r--. 1 root root 0 Aug 10
17:42 /proc/sajat/taszk_stat
Example 3.2. Playing the game with the current macro
Modify the third example module (that can be found in the virtualized system) to print the address pointed by the current macro.
#include
#include #include
#include #include
#include MODULE_DESCRIPTION ("Ez a
harmadik kernel modulom - modositasa"); MODULE_AUTHOR ("Bátfai
Norbert (nbatfai@gmail.com)"); MODULE_LICENSE ("GPL"); static
int current_and_sp (void) { struct task_struct *task; struct
list_head *p; struct thread_info *ti; register unsigned long esp
asm ("esp"); ti = (struct thread_info *) (esp & ~(sizeof
(union thread_union) - 1)); list_for_each (p,
current->tasks.next) { task = list_entry (p, struct
task_struct, tasks); printk (KERN_NOTICE "nORBi a kernelben:
%p\n", ti->task); printk (KERN_NOTICE "nORBi a kernelben,
current: %p\n", current); printk (KERN_NOTICE "%s %i %u %li\n",
task->comm, task->pid, task->flags, task->state); }
return 0; } static int harmadik_init_module (void) { return
current_and_sp (); } static void harmadik_exit_module (void) {
current_and_sp (); } module_init (harmadik_init_module);
module_exit (harmadik_exit_module);
After compiling and loading/removing the module we can see the following in the kernel logs.
[ 8379.261436] nORBi a kernelben: f0f49920 [
8379.261438] nORBi a kernelben, current: f0f49920 [ 8379.261440]
systemd 1 4202752 1 [ 8379.261441] nORBi a kernelben: f0f49920 [
8379.261442] nORBi a kernelben, current: f0f49920 [ 8379.261443]
kthreadd 2 2129984 1 [ 8379.261444] nORBi a kernelben: f0f49920
[ 8379.261444] nORBi a kernelben, current: f0f49920 [
8379.261445] ksoftirqd/0 3 69238848 1 [ 8379.261446] nORBi a
kernelben: f0f49920 . . . [ 8379.261848] nORBi a kernelben:
f0f49920 [ 8379.261849] nORBi a kernelben, current: f0f49920 [
8379.261850] kworker/0:1 2486 69238880 1 [ 8379.261851] nORBi a
kernelben: f0f49920 [ 8379.261852] nORBi a kernelben, current:
f0f49920 [ 8379.261853] flush-253:1 3846 10485824 1 [
8379.261853] nORBi a kernelben: f0f49920 [ 8379.261854] nORBi a
kernelben, current: f0f49920 [ 8379.261855] kworker/1:1 6477
69238880 1 [ 8379.261856] nORBi a kernelben: f0f49920 [
8379.261857] nORBi a kernelben, current: f0f49920 [ 8379.261858]
insmod 6913 4202752 0 [ 8393.314942] nORBi a kernelben: f0d28000
[ 8393.314945] nORBi a kernelben, current: f0d28000 [
8393.314947] systemd 1 4202752 1 [ 8393.314948] nORBi a
kernelben: f0d28000 [ 8393.314949] nORBi a kernelben, current:
f0d28000 [ 8393.314951] kthreadd 2 2129984 1 [ 8393.314952]
nORBi a kernelben: f0d28000 [ 8393.314953] nORBi a kernelben,
current: f0d28000 [ 8393.314955] ksoftirqd/0 3 69238848 1 [
8393.314956] nORBi a kernelben: f0d28000 . . . [ 8393.315565]
nORBi a kernelben: f0d28000 [ 8393.315566] nORBi a kernelben,
current: f0d28000 [ 8393.315567] bash 1733 4202496 1 [
8393.315569] nORBi a kernelben: f0d28000 [ 8393.315570] nORBi a
kernelben, current: f0d28000 [ 8393.315571] kworker/0:1 2486
69238880 1 [ 8393.315572] nORBi a kernelben: f0d28000 [
8393.315573] nORBi a kernelben, current: f0d28000 [ 8393.315574]
flush-253:1 3846 10485824 1 [ 8393.315575] nORBi a kernelben:
f0d28000 [ 8393.315576] nORBi a kernelben, current: f0d28000 [
8393.315578] kworker/1:1 6477 69238880 1 [ 8393.315579] nORBi a
kernelben: f0d28000 [ 8393.315580] nORBi a kernelben, current:
f0d28000 [ 8393.315582] rmmod 6915 4202752 0
Example 3.3. Drawing a memory map
Draw a sketch of the memory to show how to compute the value of the current macro. See also the Figure 3.2 of the book [ULK], but use the number of the previous exercise.
Chapter 4. Berkeley socket API, Sys V IPC and I/O multiplexing
In this chapter we are going to create a sample program from the examples of [PP] to help the reader get acquainted with
-
Berkeley socket API
-
I/O multiplexing
-
forking processes
-
mutual exclusion
-
and semaphore arrays.
„We should select any person from the 1.5 billion inhabitants of the Earth - anyone, anywhere at all. He bet us that, using no more than five individuals, one of whom is a personal acquaintance, he could contact the selected individual using nothing except the network of personal acquaintances.”
—Frigyes Karinthy CHAIN-LINKS (1929, Everything is Different)
„Tessék egy akármilyen meghatározható egyént kijelölni a Föld másfél milliárd lakója közül, bármelyik pontján a Földnek - ő fogadást ajánl, hogy legföljebb öt más egyénen keresztül, kik közül az egyik neki személyes ismerőse, kapcsolatot tud létesíteni az illetővel, csupa közvetlen - ismeretség - alapon,”
—Karinthy Frigyes Láncszemek
1. Berkeley socket API, Sys V IPC, I/O multiplexing
2. A simple client/server sample
The first versions of this example were created for the course High level programming 1. These were based on merging the source files of [PP]. This sample program itself is a simple client/server program, its client side is based on the kliens.c of [PP] (see pp. 133). Now, we have complemented the client to wait for characters plus (+) or minus (-) to be typed in the command line.
The code of the server is placed into a source file called szerver.c that is more complex than the client. It is based on a parallel and multiplexed example of [PP] (see pp. 123) that uses POSIX signal handling. The subject matter of the server and all other servers in [PP] are substantially identical. They implement a simple echo-like server. In the present example, this functionality is complemented by a counter that counts the actions of clients.
The usage of the sources szerver.c and kliens.c is shown in the following YouTube video: http://youtu.be/J--bRSmNjZg,
.
2.1. The client side
The whole source code of the server is the following.
#include #include
#include #include
//#include #include
#include #define SZERVER_PORT 2012 #define
BUFFER_MERET 256 int kliens (char b) { int kapu, olvasva; struct
sockaddr_in szerver; char buffer[BUFFER_MERET]; memset ((void *)
&szerver, 0, sizeof (szerver)); szerver.sin_family = AF_INET;
inet_aton ("127.0.0.1", &(szerver.sin_addr)); szerver.sin_port =
htons (SZERVER_PORT); if ((kapu = socket (PF_INET, SOCK_STREAM,
IPPROTO_TCP)) == -1) {4 perror ("socket"); return -1; } if (connect
(kapu, (struct sockaddr *) &szerver, sizeof (szerver)) == -1) {
perror ("connect"); return -1; } write(kapu, &b, 1); while
((olvasva = read (kapu, buffer, BUFFER_MERET)) > 0) write (1,
buffer, olvasva); close(kapu); return 0; } int main (int argc, char
*argv[]) { int gyermekem_pid; int statusz; int i; char b = '+'; if
(argc == 2) b = argv[1][0]; for (i=0; i<4; ++i) if
((gyermekem_pid = fork()) == 0) { kliens(b); } else if
(gyermekem_pid > 0) { kliens(b); // wait(&statusz); } else {
exit(-1); } return 0; }
In the main function first we check whether the second command-line argument is given or not. If yes, then its first character will be stored in the variable b that will be sent to the server. The default value of b is a plus (+) character.
We are going to try to overload the server. For this reason, TCP connections will be opened simultaneously not sequentially. To be more precise, the following client program opens 30 TCP connections to the server. If you do not understand why exactly the following code snippet works, then suspend the elaboration of this example and first start by solving the next exercise.
for (i=0; i<4; ++i) if
((gyermekem_pid = fork()) == 0) { kliens(b); } else if
(gyermekem_pid > 0) { kliens(b); wait(&statusz); } else {
exit(-1); }
Example 4.1. 30 processes, with using a paper and pen
Here you have to show, how the next code snippet operates. Just use a paper and pen in this exercise. Your solution is probably good if the result 30=2+4+8+16 can be read easily from your own sketch.
for (i=0; i<4; ++i) if
((gyermekem_pid = fork()) == 0) { kliens(b); } else if
(gyermekem_pid > 0) { kliens(b); // wait(&statusz); } else
{ exit(-1); }
Connecting and communicating with the server are hidden in the kliens function. Here, if the dear reader is not already familiar with the structure struct sockaddr_in, it is described in the manual page man 7 ip. This structure represents a communication endpoint, as it can be seen on the next manual snippet from man 7 ip.
struct sockaddr_in { sa_family_t sin_family; /* address
family: AF_INET */ in_port_t sin_port; /* port in network byte order
*/ struct in_addr sin_addr; /* internet address */ }; /* Internet
address. */ struct in_addr { uint32_t s_addr; /* address in network
byte order */ }; sin_family is always set to AF_INET. This is
required; in Linux 2.2 most networking functions return EINVAL when
this setting is missing. sin_port contains the port in network byte
order. The port numbers below 1024 are called privileged ports (or
some‐
We need to create this structure and set its members (such as protocol family, IP address and port number) accordingly:
struct sockaddr_in szerver; memset
((void *) &szerver, 0, sizeof (szerver)); szerver.sin_family =
AF_INET; inet_aton ("127.0.0.1", &(szerver.sin_addr));
szerver.sin_port = htons (SZERVER_PORT);
In the following, if we see unknown function calls or data structures we we will find a more detailed description in the appropriate manual page. For example, in the actual case above, see the page man 3 htons.
Organization of the manual pages
The readers have already noticed that manual pages are organized into numbered sections, as shown in the following breakdown:
-
Section 1, User commands (Linux User's Manual/User Commands/User utilities), for example man w, man who or man passwd
-
Section 2, System calls (Linux Programmer's Manual), man 2 read
-
Section 3, Library functions (Linux Programmer's Manual), man 3 printf
-
Section 4, Special files, man 4 stdin
-
Section 5, Formats and protocols man 5 passwd
-
Section 6, Games, man 6 fortune
-
Section 7, Miscellaneous, man 7 hier
-
Section 8, System administration, man 8 shutdown
(For further details, see man man.)
The man {section number (1-8)} intro command gives a general description of the above sections.
Then, using the socket system call, we create the abstraction of a communication endpoint that will be connected to the previously specified address by the connect system call. It should be noted that the number returned by the socket is a file descriptor. (See also the Párhuzamos programozás GNU/Linux környezetben: SysV IPC, P-szálak, OpenMPhttp://www.inf.unideb.hu/~nbatfai/konyvek/PARP/parp.book.xml.pdf, [PARP] book about the appearance of file descriptors in the kernel.)
The program sends a byte to this file descriptor using the write system call, then reads a byte from the same file descriptor using the read and so on in the while loop, as long as it can read while it is echoing the read data to the standard output.
write(kapu, &b, 1); while ((olvasva
= read (kapu, buffer, BUFFER_MERET)) > 0) write (1, buffer,
olvasva);
The portability of the example
If we compile the sources on different GNU/Linux environments we will get warnings from the compiler. In most of such cases it is typically enough to include the appropriate header files to solve these problems.
NOTES POSIX.1-2001 does not require the inclusion of
, and this header file is not required on
Linux. However, some historical (BSD) implementations required
this header file, and portable applications are probably wise to
include it.
2.2. The server side
The whole source code of the server can be found in the source file called szerver.c.
// szerver.c // // "forkos, socketes,
multiplexelt, szemaforos, osztott memóriás" állatorvosi // szerver
ló a laborra Programozó Páternoszter // // Copyright (C) 2011, 2012
Bátfai Norbert, nbatfai@inf.unideb.hu // // This program is free
software: you can redistribute it and/or modify // it under the
terms of the GNU General Public License as published by // the Free
Software Foundation, either version 3 of the License, or // (at your
option) any later version. // // This program is distributed in the
hope that it will be useful, // but WITHOUT ANY WARRANTY; without
even the implied warranty of // MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the // GNU General Public License for more
details. // // You should have received a copy of the GNU General
Public License // along with this program. If not, see
Share with your friends: |