The present invention is described with reference to an illustrative application thereof, namely asynchronous input/output (I/O) operations. However, as will later be apparent, the invention is not so limited.
I/O is an essential element of every computer system and, accordingly is a well-developed, mature field. I/O includes not just inputting data from a keyboard and outputting data to a printer, but extends to myriad different devices and applications, from communicating with internal computer resources (e.g. disks and memory) to communicating with other computers across the country (e.g. network interfaces).
The most familiar model of I/O, popularized by the UNIX operating system (OS), is to view each I/O operation as a simple exchange of a stream of bytes with a virtual file. The virtual file can represent any source or destination for data.
In the UNIX environment, the virtual file is represented by file descriptors. In newer operating systems, such as the Windows NT operating system from Microsoft Corporation, virtual files (i.e. all potential sources or destinations for I/O) are represented by file objects, which are accessed by file handles. A file handle is an index into a process-specific table used to refer to an object, and incorporates a set of access rights granted to the process that owns the handle.
Returning to I/O operations generally, there are two basic types of I/O: synchronous and asynchronous.
Synchronous I/O is characterized by suspension of the calling process (e.g. an application program) until the requested I/O completes. (For example, when a word processing document is printed, operation of the word processor stalls until the printing is finished.) Completion of the I/O is signalled by an "interrupt" transmitted by the operating system to the program that requested the I/O. The program responds to the interrupt by resuming the suspended process.
Synchronous I/O is adequate in many circumstances. However modern computer processors are very fast--much faster than most I/O devices. In the time it takes an I/O device to service a single I/O request, the stalled processor might have been able to execute thousands or millions of lines of computer instructions. To rectify this inefficiency, operating systems from Microsoft (e.g. Windows NT), Digital Equipment Corp (e.g. VMS), AT&T (e.g. UNIX), and others have made widespread use of asynchronous I/O.
Asynchronous I/O is characterized by the continuation of computer processing while a requested I/O operation completes. After an I/O operation completes, the operating system notifies the calling application program, again usually via an interrupt. Once so notified, the calling application knows it can make use of any data obtained during the I/O operation. During the pendency of the I/O request, the application is free to attend to other tasks.
One powerful capability enabled by asynchronous I/O is the overlapping of several I/O operations by an application. For example, an application program may want to read data from the keyboard and send it to a file, while also reading data from a disk drive and writing it to the screen.
In response to each of these four file operations, the operating system will typically issue an interrupt to signal completion to the calling application. Associated with each interrupt is a fairly substantial processing overhead. Collectively, this overhead can consume an undue percentage of the computer's processing power, preventing its efficient execution of the intended task.
One aspect of the overhead associated with interrupts is the rote task of distinguishing the interrupts from one another. To make these distinctions, interrupts are routed to a vector table, which is consulted by the operating system to determine which interrupts correspond to which I/O requests. Once this correlation is made, a corresponding notification is issued to the application, usually by a jump to an address of an interrupt service routine as specified in the interrupt table.
When an interrupt occurs, the application must respond to it quickly. Constraints associated with most operating systems limit the number of interrupts that can be "nested," i.e. await service by an application, at any one time. The limit is relatively small--typically less than ten. If interrupts are nested to a level greater than this, the excess interrupts are lost, usually with grave results.
The requirement of immediate application program response to interrupts incurs additional overhead. For example, whatever application program processing that was taking place at the moment the interrupt is received must be suspended so the processor can execute the corresponding interrupt service routine. Suspension of the processing requires that various intermediate results resident in processor registers, etc., be swapped out to other memory locations to free these processor resources for the interrupt service. Conversely, when the interrupt servicing is completed, these intermediate results must be restored to their original locations so the interrupted application program processing can be resumed.
Another problem is that some I/O operations, such as large disk reads, are often implemented as several smaller operations. While there are certain advantages to this procedure, it results in a commensurate increase in overhead associated with interrupt servicing, and an attendant reduction in processing efficiency.
The foregoing problems are exacerbated in multi-processing operating systems, in which several different applications are alternately executing and "sleeping" to effect apparent simultaneous execution. In this environment there is the further difficulty of correlating hardware interrupts to the applications to which they correspond--applications which may be "asleep" when the I/O completion interrupt corresponding thereto is returned.
The Berkeley Software Distribution version of the UNIX operating system (i.e. BSD UNIX 4.x) has a system call named select( ) that helps ameliorate some of these failings of interrupt-based asynchronous I/O. In UNIX systems using select, when an I/O operation completes, the resulting interrupt is handled by select, which sets a bit in a 32-bit mask corresponding to the completed I/O. This dedicated handling of I/O interrupts by select frees the application program to run without substantial I/O interruption.
To determine if a requested I/O is completed, the application program calls select( ) and is provided in return with several bit masks which convey data about (a) which I/O devices are available to serve requests, (b) which I/O devices have completed requested operations, and (c) any I/O device exceptions. The application program then decodes this information to determine whether the I/O it requested is completed, and then proceeds accordingly.
While an improvement over earlier approaches, select has numerous problems of its own. One is that the temporal information provided to select by the order in which it receives the interrupts is lost; there is no way for the application programs to learn which I/O operations have been completed the longest. This is of particular concern in multi-tasking systems, in which ensuring "fairness" between concurrent processes requires knowledge of which I/O results have been waiting the longest for further processing.
Select also suffers from "collisions." Collisions arise when multiple processes attempt to select on the same I/O devices (i.e. file descriptors) at the same time. When collisions arise, each process making the select( ) call receives a "failure" notification and must call select( ) again, thereby generating more system overhead and interrupts.
A still further drawback of select( ) is its inability to deal with multi-threaded processes. Such processes are widely used in many newer operating systems, such as Microsoft Windows NT.
A multi-threaded process has two more threads for process execution within a single process. Each thread shares the same address space, descriptors and other resources within the process, but has its own program counter for execution. To achieve concurrency using threads, an application program creates two or more threads within a process to execute different parts of the program within the same process. A multi-threaded process may be used for i/O operations on several descriptors, and can be used for I/O operations that happen on the same descriptor. A similar approach in the prior art splits a single-threaded process into multiple processes for execution. However, since processes require significantly more operating system overhead than threads, the multi-threaded process approach is preferred.
Select( ) cannot handle multiple threads or the concurrency that is required by simultaneous asynchronous I/O operations on the same descriptor, or different descriptors.
Another method used in the prior art to provide information about the occurrence of an I/O event is to send a messages to the corresponding file handle. Message queues are used to collect and pass message information in both threaded process environments, and non-threaded process environments. For example Windows 3.x uses message queues to route messages to windows applications. Windows 3.x maintains a single system message queue and any number of thread message queues, one for each thread. Whenever a user moves the mouse, clicks mouse buttons, or types at the keyboard (I/O events), the device driver for the mouse or keyboard converts the input into messages and places them in the system wide message queue. A Windows message handler removes the messages, one at a time, from the system message queue, examines them to determine the destination thread, and then posts them to message queue of the thread. A thread's message queue receives all mouse and keyboard messages from the system wide queue and directs the Windows kernel to send them to the appropriate Window's application associated with the thread for processing.
One disadvantage to sending all messages (including I/O messages) to a single system wide queue is that Windows must sort through and process a large number of different types of messages. This is expensive in terms of processing time. A queue per thread also is expensive in terms of system resources and system overhead. If a large number of threads are used, then a large number of thread queues have to be created, managed, and then deleted. Windows must also maintain a table of addresses for all the thread queues. Another disadvantage to this approach is that every message is "handled" at least three times (e.g. once by Windows to take it out of the single system queue, once by the thread to remove it from the thread queue, and once by the Windows application that must process the message). Handling every message several times causes significant operating system overhead and delaying the processing of individual applications that may be waiting for messages.
In accordance with a preferred embodiment of the present invention, the foregoing and other disadvantages of the prior art are overcome with an I/O completion port object. The I/O completion port object is an I/O object with a queue that provides a single synchronization point and controllable concurrency for multiple simultaneous asynchronous I/O operations. These multiple simultaneous asynchronous I/O operations could be the result of I/O requests from a single computer system, or a network of computer systems. For example, an I/O completion port can be used to synchronize hundreds of network I/O operations.
An I/O completion port is created and associated with a file descriptor. Any number of file descriptors can then be associated with a single I/O completion port. If a process creates a number of threads to complete an operation on a single or on multiple descriptors, then all the threads are also associated with an I/O completion port. As a result, the I/O completion port provides concurrency control for these multiple threads.
Once a file descriptor is associated with an I/O completion port, the completion of any subsequent I/O request on that descriptor causes an I/O completion port packet to be queued to the I/O completion port. The I/O completion port completion packet contains information about the I/O request (e.g. success, failure, amount of information transferred, etc.). Instead of having the I/O system contact the application which made the I/O request, the application checks the I/O completion port's queue to determine if the I/O request has been completed. The I/O completion port completion packet is then used to determine the state of the completed I/O and initiate any subsequent action.
The I/O completion port allows tracking of I/O operations not only per descriptor, but also per I/O operation. If multiple threads are created to complete the per I/O operation, then the concurrency of the threads is handled by the I/O completion port. For example, if a large read operation is split into several smaller read operations (several threads) of a certain block size, the I/O completion port can track the completion of the reads. If read number two finishes before read number one, the I/O completion port can be used by the application to determine which read was which.