1. Field of the Invention
This invention relates to computer systems, and more particularly, to the control of context on input/output devices in a new input/output architecture for computer systems.
2. History of the Prior Art
In the 1960s, International Business Machines (IBM) and Control Data Corporation (CDC) produced mainframe computers with architectures in which a central processing unit (CPU) controlled program manipulation and separate input/output processors (called channel processors or peripheral processor units) controlled input/output operations. The input/output processors had instruction sets which allowed them to carry out the somewhat limited functions designated by commands placed in memory by the central processing unit. For example, the input/output processors knew how to access data on disk and place data on an output display. This form of architecture made, and in some cases still makes, a great deal of sense. At that time, central processing units were very expensive; and using the central processing unit to accomplish input/output operations was very wasteful. Neither the CDC nor the IBM input/output processors were as powerful as the central processing unit and thus could be produced relatively inexpensively. These architectures allowed individual computers to be built to emphasize operations by the central processing unit or operations by the input/output devices. By building a faster central processing unit, the main processing functions could be made to go faster; while by building faster input/output processors, the input/output operations could be accelerated.
As an example of this type of operation, in the IBM system, the central processing unit would signal which input/output operation it desired by writing channel commands to main memory and signaling a channel processor that there was something for it to do. The channel processor would read those commands and proceed to execute them without aid from the central processing unit. If an input/output processor was instructed to do something, it would do it. As long as the operation was safe, there was no problem. Unfortunately, if the operation was something prohibited like reformatting the hard disk which contained the basic operating system, the input/output processor would also do that.
These architectures were designed to allow programs to time share (multi-task) the central processing unit. With an operating system which allows multi-tasking, it is necessary to protect the resources allotted to one application program from operations conducted by other application programs so that, for example, one program cannot write to memory over the space utilized by another program. An important part of this protection is accomplished by keeping application programs from writing directly to portions of the system where they might cause harm such as main memory or the input/output devices. Since the input/output processors would do whatever they were instructed in the IBM and CDC systems, it was necessary to limit access to these input/output processors to trusted code, generally operating system code and device drivers, in order to preclude application programs from undertaking operations which would interfere with other application programs or issuing commands which would wreak havoc with the system. Apart from any other problems, writing directly to the input/output devices creates a security problem in a multi-tasking system because the ability to write to and read from input/output devices such as the frame buffer means an application program may read what other programs have written to the device. For these reasons, both the IBM and CDC architectures kept any but privileged operating system code from writing to operating system memory and to the input/output devices.
In 1971, the Digital Equipment Corporation (DEC) PDP11 computer appeared. In the original embodiment of this architecture, all of the components of the computer are joined to a system backplane bus. The central processing unit and any other component of the computer (except main memory) addresses each other component as though it were an address in memory. The addresses for the various hardware components including input/output devices simply occupy a special part of the memory address space. Only the address itself indicates that a component is a device such as an input/output device which is other than memory. When the central processing unit wants to accomplish an input/output operation, it simply writes or reads addresses assigned to the particular input/output device in memory address space. This architecture allows all of the operations available to the central processing unit to be utilized in accomplishing input/output operations and is, therefore, quite powerful. Moreover, this allows the input/output operations to be accomplished without the need for special commands or for special resources such as input/output processors. It also allows the use of very simple input/output controllers which typically amount to no more than a few registers.
As with the earlier IBM and CDC architectures and for the same reasons, writing to the input/output devices directly by other than trusted code is prohibited by the PDP11 operating systems. The PDP11 architecture provides a perfect arrangement for handling this. This architecture, like some of its predecessors, incorporates a memory management unit designed to be used by an operating system to allow the addressing of virtual memory. Virtual memory addressing provides access to much greater amounts of memory than are available in main memory by assigning virtual addresses to data wherever it may be stored and translating those virtual addresses to physical addresses when the data is actually accessed. Since operating systems use memory management units to intercept virtual addresses used by the central processing unit in order to accomplish the virtual-to-physical address translation, operating systems may simply provide no virtual-to-physical translations of any input/output addresses in the memory management unit for application programs. Without a mapping in the memory management unit to the physical addresses of input/output devices, the application program is required to use a trusted intermediary such as a device driver in order to operate on an input/output device in the PDP11 architecture.
Thus, in a typical computer system based on the PDP11 architecture, only trusted code running on the central processing unit addresses input/output devices. Although this architecture allows all of the facilities of the central processing unit to be used for input/output, it requires that the operating system running on the central processing unit attend to all of the input/output functions. Requiring a trap into the system software in order to accomplish any input/output operation slows the operation of the computer. Moreover, in contrast to earlier systems, in this architecture, there is no process by which the input/output performance of the system can be increased except by increasing the speed of the central processing unit or the input/output bus. This is an especial problem for programs which make heavy use of input output/devices. Video and game programs which manipulate graphics extensively and make extensive use of sound suffer greatly from the lack of input/output speed.
This problem is especially severe because when only trusted code can access input/output devices, then all accesses must be through this trusted code. That means that each operation involving input/output devices must go through a software process provided by the operating system and the input/output device drivers. The manner in which this is implemented is that when an application program is running on the central processing unit, the addresses it is allowed to access are mapped into the memory management unit by the operating system. None of these addresses may include input/output addresses. When an application program desires to accomplish an input/output operation, it executes a subroutine call into the operating system library code. This subroutine performs an explicit trap into the operating system kernel. As a part of the trap, the operating system changes the memory management unit to create mappings to the device registers. The operating system kernel translates the virtual name used for the input/output device by the application program into the name of a device driver. The operating system kernel does a permission check to ensure that the application is permitted to perform this operation. If the application is permitted to perform the operation, the operating system kernel calls the device driver for the particular input/output resource. The input/output device driver actually writes the command for the operation to the registers of the input/output hardware which are now mapped by the memory management unit. The input/output device responds to the command by conducting the commanded operation and then generates signals which indicate whether the operation has succeeded or failed. The input/output device generates an interrupt to the device driver to announce completion of the operation. The device driver reads the signals in the registers of the input/output device and reports to the operating system the success or failure of the operation. Then the operating system returns from the trap with the success or failure indication, restores the mappings for the application and thus removes the mappings for the device registers, and ultimately returns from the subroutine call reporting the success or failure of the operation to the unprivileged code of the application.
This sequence of steps must take place on each operation conducted using input/output resources. The process is inordinately long, and a recitation of the steps involved illustrates why applications using graphics or other input/output devices extensively cannot be run at any real speed on such systems.
This problem has been made worse by the tendency of hardware manufacturers to bias their systems in favor of write operations to the detriment of read operations. This bias has gradually increased as processors have become faster (the only way to accelerate a system having the PDP11 architecture) while bus speed has tended to lag requiring that write operations on the bus be buffered. The interface in this type of architecture (including Intel X86 type systems) between input/output devices and the input/output bus includes a plurality of registers to which the central processing unit may write and which the central processing unit may read. Since write operations are buffered, all write commands in the write buffer queues must be processed through the buffers before any read can proceed. And during a read operation, the central processing system cannot conduct other operations since it must typically remain on the input/output bus in order to read synchronously the data being transferred. In some systems, some read operations take as much as twenty times as long as write operations.
Since the operating system running on the central processing unit must handle all of the reads and writes to input/output devices in this architecture, the central processing unit is further slowed by this hardware bias when dealing with input/output intensive applications. For example, manipulating graphic images typically requires extensive read/modify/write operations. Many application programs which make extensive use of input/output devices, including a great number of games, are unable to function with architectures which require that the operating system read and write to the output devices on behalf of the applications. In order to obtain the speed necessary to display their operations satisfactorily such programs must read and write to the input/output devices directly. This has always been allowed by the Microsoft DOS operating system but by none of the advanced operating systems such as Unix. Ultimately, with extensive urging by the windows system developers, the operating system designers of workstation operating systems have grudgingly allowed applications to read and write to the graphics circuitry directly by mapping some of the physical addresses which the input/output devices decode to their memory address space. This allows windows system developers to read and write to the graphics hardware directly even though the security and integrity of the system is compromised by so doing. There have also been multitasking system which have allowed application programs to write directly to the graphics hardware. However, these systems have required that the operation be accomplished using the operating system software to trap input/output accesses and accomplish context switching to assure that application programs do not interfere with one another; consequently, the result is significantly slower than desirable.
For all of these reasons, many games simply avoid multitasking operating systems such as windows systems. In general, games must be operated in single tasking systems such as Microsoft DOS which allows an unlimited form of writing directly to the input/output devices while sacrificing the integrity of the system.
It is very desirable to provide a new architecture which allows input/output operations to proceed at a faster speed so that application programs which make significant use of the input/output components may function in the advanced multi-tasking operating systems without sacrificing system integrity.
Another major problem has surfaced with computers running multi-tasking operating systems. In a computer system, different application programs must have access to the different input/output devices of the system. In order for an application program to run with an input/output device, various registers of the input/output device should hold values which control the operation of the device. For example, certain registers of a graphics control circuit are typically filled with values which indicate in some manner the values which are to be used by the graphics control device in its color lookup tables. Similarly, other registers hold other values needed for the operation of a graphics controller and other input/output devices with each application program. Typically, these values and other values which must be changed for operation of a particular input/output device with a particular application program are referred to as the "context" of that application for that input/output device.
Different application programs provide different context values for the registers of input/output devices and thus required different context on the input/output device in order to function most effectively. In certain cases, the same application program will place different context on an input/output device in order to accomplish different specific operations with the device. In computer systems and especially in multi-tasking systems where different application programs run on the same processor in interleaved operations, it is necessary in order for each application program to make most effective use of the input/output devices to switch the context of various input/output devices whenever a new application has access to the input/output device. Often it is also desirable to switch the context on an input/output device when, though the same application program is accessing the input/output device, the use of the device has changed.
Prior art multi-tasking systems have devised various ways of handling this problem. One advanced technique described in U.S. Pat. No. 5,127,098 requires that a system memory management unit provide a valid mapping for a single application program at a time to a single address space of the input/output device. Whenever a different application is to use the input/output device, the system traps the attempt, interrupts the operation, saves the present context, provides a new mapping to the memory management unit for the new application program, provides new context to the input/output device for the application program, and restarts the system so that the new application may access the device.
A problem with this method is that the operating system must be called to change the context for each different application. This is quite slow and drastically delays the operation of a multi-tasking system which is constantly switching between application programs.
It is desirable to provide apparatus and a method for rapidly switching context in a computer system.