A primary factor in the utility of a computer system is its speed in executing application programs. A high-performance computer system is expected to be responsive to user inputs and to accurately provide processed results within real-time constraints. A primary factor in the speed and responsiveness of a computer system is the efficiency of its processor subsystem, memory subsystem, I/O (input output) subsystem, and the like. Large investments have been made in the development of very high-speed processors and high-speed memory subsystems. Consequently, the computer industry has seen remarkable annual improvements in computer system performance. A comparatively new area of focus for improving computer system performance is the input output mechanisms involved in accessing and storing data.
Data is typically stored on attached hard disk drives. Disk drives having a size of 200 GB or more are increasingly common in desktop and laptop computer systems. Fast and efficient access to data stored on such drives is important to responsiveness and functionality of typical user applications.
ATA (AT Attachment) is a widely supported specification that defines methods of accessing data on disks. The ATA specification evolved from the earlier IDE (integrated drive electronics) specification. ATA defines a type of hardware interface that is widely used to connect data storage peripheral devices such as hard disk drives, CD-ROMs, tape drives, and the like, to a computer system. The ATA standard has further evolved to accommodate additional device types and data transfer features. For example, ATAPI (ATA Packet Interface) defines a version of the ATA standard for CD-ROMs and tape drives, ATA-2 (Fast ATA) defines the faster transfer rates used in Enhanced IDE (EIDE), and ATA-3 adds interface improvements, including the ability to report potential problems.
ATA devices have shown dramatic increases in data transfer speed and storage capacity over time. However, computer systems using such faster devices have not fully shown the expected performance improvements. A number of interface problems with computer system I/O components are partially responsible for the performance limitations, such as, for example, the data transfer characteristics of the PCI bus (e.g., due to the need to retain host adapter PCI compatibility), the interrupt based data transfer mechanisms, and the like.
The ADMA (Automatic DMA) specification comprises a new specification designed to improve the performance of ATA type devices. ADMA is designed to add features that improve the data transfer speed and efficiency of ATA devices. For example, ADMA adds support for multi-threading applications, command chaining techniques, command queuing, and the like, which are intended to have the overall effect of decoupling the host command sequence from the channel execution. The objective of the ADMA standard is to dramatically increase the performance of computer systems that operate with ATA type devices.
One goal of the ADMA specification was to correct the inability of the prior art ATA specification to queue multiple I/O commands. In the ATA specification, an application can only have one I/O command (e.g., a disk I/O request) to an I/O driver (e.g., the software driver for a disk controller) outstanding at a given time. A subsequent disk I/O command can only be submitted once the previous disk I/O command completes. Hundreds of microseconds can elapse from the submission of the disk I/O request to the completion of the disk I/O request. If the application calls the I/O driver with the subsequent disk I/O request before it has completed the previous disk I/O request, the driver will reject the subsequent request, informing the application that it must wait until the previous request completes. The ADMA specification attempts to solve this problem by enabling a software application to submit multiple disk I/O requests to a driver and have multiple disk I/O requests outstanding.
Problems exist, however, with respect to how the prior art ADMA specification implements such multiple disk transactions. One such problem is the inability of multiple threads of an application, or of multiple applications, to append an existing command chain (e.g., a chain of multiple disk I/O requests). As described above, ADMA adds support for command chaining, command queuing, and the like. These techniques are designed to allow multiple I/O commands to be outstanding simultaneously. In other words, several commands outstanding at once, as opposed to issuing one command and waiting for it to complete before issuing the next command.
Unfortunately, once a chain of I/O commands has been established by a given application, the prior art ADMA specification makes it difficult to come back at a later time and add new command chains for execution. The prior art ADMA specification specifies a mechanism whereby command chains are added for execution by appending new commands to the previously specified command chain. For example, a chain of disk I/O commands generally comprises a chain of CPBs (command parameter blocks). The CPBs are data structures containing command sets that describe the disk transaction commands to be executed by the disk I/O engine. The CPBs are linked through a system of pointers, with each CPB have a pointer to the next CPB in the chain. Thus, a CPB chain is appended by altering the pointers in the last CPB of the chain to include the new CPBs. The coherency of the pointers must be maintained in order to ensure the reliable functioning of the disk I/O system.
The use of the prior art ADMA command chain appending schemes imposes a significant overhead burden on the computer system. The prior art ADMA specification relies upon a system of memory locks to maintain the coherency of the pointers of a CPB chain. The memory locks are implemented in order to ensure only one software process, or thread, can manipulate a CPB chain at a time. This can be very inefficient in a modern computer system having a modern, multithreaded, multiprocess software execution environment. Each thread executing on the system must negotiate the memory locks in order to append the CPB chain. For example, a typical scenario requires one thread to unlock a command chain in order to gain access, append its new commands, lock the command chain, and have a second thread unlock the command chain, append its new commands, and re-lock the command chain. Thus, the prior art ADMA memory lock scheme adds a significant amount of overhead. The excessive overhead is especially problematic in the case of a modern multithreaded, multitasking computer system where, for example, many different threads may want to add disk I/O requests to a command queue for a disk drive.
The overhead problem of the prior art ADMA disk transaction methodology can significantly detract from overall computer system performance. As processor and system memory performance continue to show annual improvement, it becomes increasingly important that disk I/O systems show similar improvements. As latency penalties are reduced in other components of a computer system (e.g., data transfer buses, graphics operations, etc.) it becomes increasingly important that the disk I/O system shows similar degrees of improvement in order to avoid imposing performance bottlenecks on the overall computer system.