Jump to content

Microkernel

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 80.203.86.71 (talk) at 13:32, 25 April 2007. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Graphical overview of a microkernel

A microkernel is a minimal computer operating system kernel providing only basic operating system services (system calls), while other services (commonly provided by kernels) are provided by user-space programs called servers. Commonly, microkernels provide services such as address space management, thread management, and inter-process communication, but not networking or display for example.

Later extensions of this concept led to new architectures such as nanokernels, exokernels and hardware abstraction layers, although the minimality principle used by Jochen Liedtke in the design of the L4 microkernel implies that these terms really mean the same thing. Microkernels and derivatives are in contrast to a monolithic kernel.

Advantages of the microkernel approach to system design: (a) adding a new service does not require modifying the kernel, (b) it is more secure as more operations are done in user mode than in kernel mode, (c) a simpler kernel design and functionality typically results in a more reliable operating system.

Early operating system kernels were rather small, partly because computer memory was particularly limited. As the capability of computers grew, the number of devices the kernel had to control also grew. Early versions of UNIX had kernels of quite modest size, even though those kernels contained device drivers and file system managers. When address spaces increased from 16 to 32 bits, kernel design was no longer cramped by the hardware architecture, and kernels began to grow. (See History of Unix).

Berkeley UNIX (BSD) began the era of big kernels. In addition to operating a basic system consisting of the CPU, disks and printers, BSD started adding additional file systems, a complete TCP/IP networking system, and a number of "virtual" devices that allowed the existing programs to work invisibily over the network.

This growth continued for several decades, resulting in UNIX, Linux, and Microsoft Windows kernels with millions of lines of source code. For example Linux 2.6 contains about 2.5 million source lines of code in the kernel (of about 30 million in total), while Windows XP is estimated at twice that.

Inter-process communication

Inter-process communication (IPC) is any mechanism which allows separate processes running on the same operating system to intercommunicate, usually by sending messages. This allows the operating system to be built of a number of small programs called servers, which are used by other programs on the system. Most or all hardware support is handled in this fashion, with programs for networking, file systems, graphics, etc.

There are two basic approaches to IPC, synchronous and asynchronous. Synchronous communication behaves like a subroutine call: A request is made, the caller waits, and when the request has been serviced, the caller regains control. This model behaves, from the caller perspective, much like a kernel call to a monolithic kernel. If the kernel cannot process the request immediately, the program is "blocked" (see thread) and the kernel looks for another program that can be run while the first waits.

In a microkernel system, the synchronous kernel call model must be extended to allow one program to call another. Various microkernels have taken different approaches to the problem.

Jochen Liedtke in his L4 microkernel pioneered techniques that lead to an order-of-magnitude reduction of IPC costs.[1] These include an IPC system call that supports a send as well as a receive operation, making all IPC being synchronous, in order to avoid the overhead of buffering in the kernel and multiple copying, and passing as much data as possible in registers. Furthermore, Liedtke introduced the concept of the lazy process switch, where during an IPC execution is switched directly from the sender to the receiver. This avoids the overhead if invoking the scheduler, and optimising the common case where IPC is used in an RPC-type fashion by a client invoking a server. Another optimization, called lazy scheduing, avoids traversing scheduling queues during IPC by leaving threads that block during IPC in the ready queue. Once the scheduler is invoked, it removes such threads from the ready queue. However, in many cases a thread gets unblocked before the next scheduler invocation, in such cases this approach saves significant work. Similar approaches have since been adopted by QNX and Minix 3.

With asynchronous messaging, the message sender places data on a queue. The message sender is not blocked but continues to run when sending a message, unless the queue is full. This requires buffering in the kernel, which means that messages are copied twice (sender to kernel and kernel to receiver). The Berkeley sockets model from the UNIX world, which follows the earlier UNIX byte-stream pipe mechanism, fits this model. POSIX adds asynchronous message queues, which queue and send discrete messages. [1]

It is common to construct synchronous messaging from asynchronous messaging, by sending a message, then waiting for a reply. (See Inter-process communication for examples of such systems.) But this is inefficient and results in scheduling delays, as described above. Microkernels, with their extensive use of interprocess communication, need higher performance. Thus, most microkernel systems (including L4, Mach, QNX, and Minix 3) offer some form of synchronous messaging.

Servers

Microkernel servers are programs like any others, except that the kernel grants some of them privileges to interact with parts of memory that are otherwise off limits to most programs. This allows some servers to interact directly with hardware.

A basic set of servers for a general-purpose microkernel includes file system servers, device driver servers, networking servers, display servers, and user interface device servers. This set of servers (drawn from QNX) provides roughly the set of services offered by a monolithic UNIX kernel. The necessary servers are started at system startup and provide services, such as file, network, and device access, to ordinary application programs. The functions in the kernel of such a system are thus quite limited. With such servers running in the environment of a user application, server development is similar to ordinary application development, rather than the build-and-boot process needed for kernel development.

Additionally, many "crashes" can be corrected for by simply stopping and restarting the server. (In a traditional system, a crash in any of the kernel-resident code would result in the entire machine crashing, forcing a reboot). However, part of the system state is lost with the failing server, and it is generally difficult to continue execution of applications, or even of other servers with a fresh copy. For example, if a server responsible for TCP/IP connections is restarted, applications could be told the connection was "lost" and reconnect to the new instance of the server. For QNX, this capability is offered as the QNX High Availability Toolkit.

In order to make all servers restartable, some microkernels have concentrated on adding various database-like techniques like transactions, replication and checkpointing need to be used between servers in order to preserve essential state across single server restarts. A good example of this is ChorusOS, which was targeted at high-availability applications in the telecommunications world. Chorus included features to allow any "properly written" server to be restarted at any time, with clients using those servers being paused while the server brought itself back into its original state.[citation needed]

Essential components

The minimum set of services required in a microkernel seems to be address space management, thread management, inter-process communication, and timer management. This minimal design was pioneered by Brinch Hansen's Nucleus and the hypervisor of IBM's VM. It has since been formalised in Liedtke's minimality principle:

A concept is tolerated inside the microkernel only if moving it outside the kernel, i.e., permitting competing implementations, would prevent the implementation of the system's required functionality.[2]

Everything else can be done in a user program, although device drivers implemented as user programs may require special privileges to access I/O hardware.

Start up (booting) of a microkernel can be difficult. The kernel alone may not contain enough services to start up the machine. Thus, either additional code for startup, such as key device drivers, must be placed in the kernel, or means must be provided to load an appropriate set of service programs during the boot process. For this reason, most microkernels do place some "external" code in the kernel itself, notably key device drivers. LynxOS and the original Minix are examples. Many also include a file system in the kernel, which makes booting easier and improves performance.

A key component of a microkernel is a good IPC system. Since many services can be performed by user programs, good means of communication between programs are essential, far more so than in monolithic kernels. The design of the IPC system makes or breaks a microkernel. To be effective, the IPC system must not only have low overhead, but also interact well with CPU scheduling.

Some microkernels are designed for high security applications. EROS and KeyKOS are examples. Part of secure system design is to minimize the amount of trusted code; hence, the need for a microkernel. Work in this direction has not resulted in widely deployed systems, with the notable exception of systems for IBM mainframes such as KeyKOS and IBM's VM.[citation needed]

Performance

Traditional performance problems with microkernels revolve around the costs of IPC. The costs are due to the extra work that older microkernels do to copy data between servers and application programs,[3] and the extra context switch operations.

Attempts have been made to reduce or eliminate the copying cost by using the memory management unit (MMU) to transfer the ownership of memory pages between processes. This approach, which is used by Mach, adds complexity but reduces the overhead for large data transfers. L4 adds a lightweight mechanism using registers if the amount of data being passed is small, which can dramatically improve performance, both in terms of copying, and avoiding misses in the CPU's cache. L4's IPC performance is unbeaten across a range of architectures.[4][5][6] By contrast, QNX does all IPC by direct copying, incurring some extra copying costs but reducing code size and complexity.

Systems that support virtual memory and page memory out to disk create additional problems for IPC. Unless both the source and destination areas are currently in memory, copying must be delayed, or staged through kernel-managed memory. Copying through kernel memory adds an extra copy cost and requires extra memory. Delaying copying for paging delays complicates the IPC. QNX avoids this problem by not paging, which is the usual solution for a hard real-time system like QNX.[citation needed]

Reducing context-switch cost requires careful design of the interaction between IPC and CPU scheduling. Historically, UNIX IPC was based on the UNIX pipe mechanism and the Berkeley sockets mechanism used for networking.[citation needed] But neither of these has the performance needed for a usable microkernel.[citation needed] Both are unidirectional I/O-type operations, rather than the subroutine-like call-and-return operations needed for efficient user to server interaction. Mach has very general primitives which tend to be used in a unidirectional manner, resulting in scheduling delays. The Vanguard microkernel supported the "chaining" of messages between servers, which reduced the number of context switches in cases where a message required several servers to handle the request.

The question of where to put device drivers owes more to history than design intent. In mainframes, I/O channels have memory management hardware to control device access to memory, and drivers need not be entirely trusted.[2] The Michigan Terminal System (MTS), in 1967, had user-space drivers, the first operating system to be designed in that way.

Most minicomputers and microcomputers have not interposed a memory management unit between devices and memory. (Exceptions include the Apollo/Domain workstations of the early 1980s.) Since device drivers thus had the ability to overwrite any area of memory, they were clearly trusted programs, and logically part of the kernel. This led to the traditional driver-in-the-kernel style of UNIX, Linux, and Windows.[7]

As peripheral manufacturers introduced new models, driver proliferation became a headache, with thousands of drivers, each able to crash the kernel, available from hundreds of sources. This unsatisfactory situation is today's mainstream technology.[3]

With the advent of multiple-device network-like buses such as USB and FireWire, more operating systems[citation needed] are separating the driver for the bus interface device and the drivers for the peripheral devices. The latter are good candidates for moving outside the kernel. So a basic feature of microkernels is becoming part of monolithic kernels.

Security

In 2006 the debate about the potential security benefits of the microkernel design increased.[8]

Many attacks on computer systems take advantage of bugs in various pieces of software. For instance, one of the common attacks is the buffer overflow, in which malicious code is "injected" by asking a program to process some data, and then feeding in more data than it stated it would send. If the receiving program does not specifically check the amount of data it received, it is possible that the extra data will be blindly copied into the receiver's memory. This code can then be run under the permissions of the receiver. This sort of bug has been exploited repeatedly, including a number of recent attacks through web browsers.

To see how a microkernel can help address this, first consider the problem of having a buffer overflow bug in a device driver. Device drivers are notoriously buggy,[9] but nevertheless run inside the kernel of a traditional operating system, and therefore have "superuser" access to the entire system.[10] Malicious code exploiting this bug can thus take over the entire system, with no boundaries to its access to resources.[11] For instance, under open-source monolithic kernels such as Linux or the BSDs a successful attack on the networking stack over the internet could proceed to install a backdoor that runs a service with arbitrarily high privileges, so that the intruder may abuse the infected machine in any way[12] and no security check would be applied because the rootkit is acting from inside the kernel. Even if appropriate steps are taken to prevent this particular attack,[13] the malicious code could simply copy data directly into other parts of the kernel memory, as it is shared among all the modules in the kernel.

A microkernel system is somewhat more resistant to these sorts of attacks[14] for three reasons. For one, an identical bug in a server would allow the attacker to take over only that program, not the entire system; in other words, microkernel designs obey the principle of least authority. This isolation of "powerful" code into separate servers helps isolate potential intrusions, notably as it allows a CPU's memory management unit to check for any attempt to copy data between the servers.

Microkernels also tend to run device-driver processes and server processes with user-mode CPU privileges. While supervisor-mode code can perform any operation the hardware can, including writing to write-protected memory, changing the CPU's fundamental data tables and switching to arbitrary address-spaces, user-mode code can only perform those operations deemed safe for application code. So device-driver and server processes running in user-mode under a microkernel system must ask the kernel to perform privileged operations for them, allowing the microkernel to check for safety and security.

But the most important reason for the additional security is that the servers are isolated in smaller code libraries, with well defined interfaces. That means that one can audit the code, as its smaller size makes this easier to do (in theory) than if the same code was simply one module in a much larger system. This may also result in fewer non-security-related bugs, improving overall stability.

Key to the argument is the fact that a microkernel, as a rule, isolates high-privilege code in protected memory because they run in separate servers. This isolation could likely be applied to a traditional kernel as well. However, it is precisely this mechanism that forces data to be passed around between programs, leading to the microkernel's performance difficulties discussed above. In the past, outright performance was the main concern of most programs. Today this is no longer quite as powerful an argument as it once was, as security problems become endemic in a well-connected world.[15]

Finally, it should be noted that securing the kernel, although a necessary condition,[16] is not sufficient to guarantee overall system security. For instance, if a bug remained in the system's web browser that allowed attack, some shellcode uploaded through that attack could still legally ask the file system to erase all the browser owner's files via the normal IPC messages. Securing against these sorts of "reasonable requests" is considerably more difficult and requires applying the principle of least authority in the design of the entire operating system, not just the (micro)kernel. The EROS microkernel operating system, and its descendants CapROS and Coyotos, are research projects that strive to do just that.

Operating Systems

First generation microkernels:

Second generation microkernels:

Notes

  1. ^ Liedtke, Jochen. "Improving IPC by kernel design". 14th ACM Symposium on Operating System Principles. Asheville, NC, USA. pp. 175–88. {{cite conference}}: Cite has empty unknown parameter: |coauthors= (help); Unknown parameter |Year= ignored (|year= suggested) (help); Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |month= ignored (help)
  2. ^ Liedtke, Jochen (1995). "On µ-Kernel Construction". 15th ACM symposium on Operating Systems Principles. pp. 237–250. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |month= ignored (help)
  3. ^ Jonathan Shapiro, Vulnerabilities in Synchronous IPC Designs (2003), in the last paragraph of section 3.1; citing J. Chen and B. Bershad, the impact of operating system structure on memory system performance, ACM Symposium on Operating Systems Principles (SOSP) 1993.
  4. ^ Liedtke, Jochen (1997). "Achieved IPC performance (still the foundation for extensibility)". 6th Workshop on Hot Topics in Operating Systems. Cape Cod, MA, USA: IEEE. pp. 28–31. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)
  5. ^ Gray, Charles (April 2005). "Itanium—a system implementor's tale". USENIX Annual Technical Conference. Annaheim, CA, USA. pp. 264–278. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
  6. ^ van Schaik, Carl (January 2007). "High-performance microkernels and virtualisation on ARM and segmented architectures". 1st International Workshop on Microkernels for Embedded Systems. Sydney, Australia: NICTA. pp. 11–21. Retrieved 1 April 2007. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
  7. ^ John Lions (August 1, 1977). Lions' Commentary on UNIX 6th Edition, with Source Code. Peer-To-Peer Communications. 1573980137.
  8. ^ Andrew S. Tanenbaum, Tanenbaum-Torvalds debate, part II; citing Tanenbaum, Herder and Bos, Can we make Operating Systems Reliable and Secure?, Computer, may 2006
  9. ^ M. Swift, M. Annamalai, B. Bershad, and H. Levy, recovering device drivers, USENIX OSDI 2004, in the introduction; citing A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, An empirical study of operating system errors, in Proceedings of the 18th ACM Symposium on Operating Systems Principles, Oct. 2001.
  10. ^ H. Wenliang Du and S. Chapin, Detecting Exploit Code Execution in Loadable Kernel Modules. Proceedings of the 20th annual Computer Security Applications Conference, 2004
  11. ^ Linux Kernel Bluetooth CAPI Packet Remote Buffer Overflow Vulnerability, a SecurityFocus advisory describing a real-world, remotely exploitable kernel buffer overflow caused by a buggy device driver.
  12. ^ "stealth", Kernel Rootkit Experiences, Phrack issue 61, describes such a technique in full.
  13. ^ Detecting exploit code execution in loadable kernel modules, op. cit.
  14. ^ A. Edwards and G. Heiser, Components + Security = OS Extensibility, Australasian Computer Systems Architecture Conference, Goldcoast, Queensland Australia, 2000 (IEEE Computer Society Press).
  15. ^ Tanenbaum, Herder and Bos, Can we make Operating Systems Reliable and Secure?, op. cit.
  16. ^ see trusted computing base

Further reading

See also