1. Field of the Invention
The present invention is directed to the field of accessing an Input/Output (I/O) device, such as a disk volume. It is more particularly directed to improving the performance of computer-implemented I/O operations that are directed to disk drives and that are associated with ported computer applications, such as a database.
2. Description of the Background Art
Typically complex computer applications, such as a database, are ported to a variety of computer systems. The porting process often includes special changes to the application to enable efficient and complete operation of the application on different computer systems. I/O operations are a significant factor in the overall performance of a complex computer application. High-performance computer applications, such as a database, may issue asynchronous, direct disk I/O commands which are not supported on the target system. A xe2x80x9ctarget computer systemxe2x80x9d as used herein refers to a computer system environment consisting of one or more specific programming languages, the application programming interfaces (APIs) available in the programming languages, and the associated file system or file systems. Therefore, changes to I/O operations may be made during the porting of an application to ensure efficient operation of the application on the computer system. Such a computer system may include the products sold under the trademarks IBM S/390(copyright) that includes the IBM OS/390(copyright) (OS/390) operating system and associated disk volumes.
Disk volumes are units of data storage that typically include data and the information used to access and manipulate the data. Disk volumes may be used to store a file system and information necessary to manipulate the file system. For example, when implementing database applications that may include disk I/O access commands for operation on the IBM OS/390 that supports UNIX System Services (OS/390 UNIX) the facilities of a hierarchical file system (HFS) may be employed. However, file systems, such as the OS/390 UNIX HFS, may only support queued disk I/O access and minimal I/O caching.
I/O caching is typically managed by either a file system or a disk controller. I/O caching is the process of storing I/O data in computer memory that may be accessed more quickly than the disk device. Therefore, I/O caching may be characterized as temporary storage of data associated with disk I/O requests in computer memory. Complex applications may implement I/O caching services for the operation of the application, bypassing the I/O caching facilities of the general-purpose file system.
More particularly general-purpose file systems, such as the OS/390 UNIX HFS, may not have an I/O caching scheme that is tailored to the characteristics of databases. For example, a file system, such as the OS/390 UNIX HFS, may only support queued disk I/O access commands and not direct disk I/O access commands. Queued disk I/O access performs its own I/O caching features for proper operation. However, a general-purpose data caching strategy that operates with queued disk I/O access operations may not be optimal for a given application. Therefore an application, such as a database, may perform its own cache management, bypass the file system, and directly access information on a disk. For example, a database may achieve better I/O access performance by using direct I/O access features, available as high-level computer language APIs on many UNIX platforms, in place of queued disk I/O access operations.
Queued disk I/O access commands may operate most efficiently with sequential I/O access operations and not random I/O access operations. Highly complex software applications, such as a database, often issue random I/O access requests and the performance of the complex software applications may suffer if I/O requests are serviced by queued disk I/O, which may be designed to optimize sequential rather than random access operations. Therefore, high-performance computer applications, such as a database, may issue direct disk I/O commands that can efficiently process random I/O requests when accessing disk volumes. If the application being ported is written using asynchronous, direct I/O APIs not supported on the target computer system, which is the case with the OS/390 UNIX C Library and the OS/390 HFS, the performance of the application may suffer because those direct I/O commands must be rewritten as queued I/O commands. This may be the case if the computer system is optimized for queued I/O. Those skilled in the art will appreciate the use of sequential I/O and random I/O operations with respect to disk I/O access.
A general-purpose file system may only be able to service I/O requests synchronously. Synchronous I/O operations typically wait for confirmation that the I/O disk access command has completed before executing another disk I/O access command. The delay in proceeding with subsequent disk I/O access commands impacts application I/O access performance. Asynchronous I/O access commands typically enable other computer operations to proceed that would otherwise wait until the I/O operation successfully completes. This allows I/O operations and other computer operations to overlap and proceed in an asynchronous fashion. Consequently, asynchronous I/O operations perform more efficiently than synchronous disk I/O operations for certain high performance applications, such as a database. Therefore, database software applications suffer performance penalties if they are constrained to use high-level language APIs that do not support asynchronous I/O operations, such as the OS/390 UNIX C Run-time APIs.
In summary, complex applications, such as databases, often include specialized features that ensure that I/O requests are properly sequenced. Typically, these features operate via direct disk I/O operations that facilitate servicing random I/O requests. Therefore, the application code may bypass the I/O caching features of the general-purpose file system in favor of the specialized I/O caching features of the application. When porting the application, limitations of the target computer system may impact the performance of the application. For instance, if a particular UNIX file system supports queued I/O access commands directed to disk volumes and not direct I/O access commands, unacceptably poor I/O access performance for the application may result. Also, if a file system supports synchronous I/O access to disk volumes and not asynchronous I/O access, poor performance for the application may result. Further, a general-purpose file system I/O caching scheme that is optimized for sequential I/O requests may result in poor performance for an application, such as a database, that issues many random I/O requests.
From the foregoing it will be apparent that there is a need to improve disk I/O when porting a complex application that uses asynchronous, direct I/O commands to a target computer system that does not support those commands.
The invention may be implemented as systems, methods, and computer products that improve the performance of computer-implemented I/O operations for complex applications, such as a database, that are ported to computer systems that are not tailored to support the high-performance services that may benefit applications. Complex applications, such as a database, often manage I/O access operations by a caching mechanism that is tailored to the needs of the application. For instance, the application I/O caching mechanism may operate in conjunction with direct disk I/O operations that facilitate servicing random I/O requests. When porting an application to a target computer system that does not support certain I/O access APIs, I/O performance of the application may be limited. For instance, a computer system""s high-level language APIs may not support certain I/O access features. The present invention may be implemented by introducing specialized I/O access features that are tailored to enhance I/O access performance for complex applications, such as a database.
For example, the present invention may be implemented so that support for queued disk I/O access commands is augmented with support for direct disk I/O access commands. The augmented support is contingent upon the availability in a computer system of synchronous, direct I/O access to disk volumes. This augmented support ensures that random I/O requests are handled optimally in addition to sequential I/O requests by an application. The present invention may be implemented on the IBM OS/390 that supports UNIX System Services with the HFS. More particularly, the present invention may augment facilities on the IBM OS/390, such as the high-level language APIs, so that an application that is ported to the IBM OS/390 UNIX System Services will operate more efficiently. OS/390 UNIX provides support for APIs and an interactive shell interface. The OS/390 APIs enable a user or program, such as a database, to request OS/390 services and OS/390 UNIX System Services. The shell interface is an execution environment that services interactive requests from users and batch requests that are included in computer programs, such as a database.
Typically, complex applications that issue direct I/O requests may be associated with an I/O caching mechanism that is managed by the application. When porting the application for use with a general-purpose file system that does not support direct I/O access, performance may be degraded. An implementation of the present invention introduces the use of direct I/O operations with applications ported for operation with general-purpose file systems that do not support direct I/O operations. The direct I/O operations used by the present invention and directed to disk volumes enable faster application I/O operations than queued I/O operations for certain complex software applications. An implementation of the present invention uses direct I/O operations to support asynchronous I/O access to disk volumes instead of synchronous I/O access to disk volumes, and to optimally process random I/O requests. Therefore, performance of disk I/O access operations that service highly complex software applications and that are associated with porting the application to a target computer system that does not support direct I/O operations, such as the OS/390 UNIX HFS, is improved by the present invention over past solutions. It will be appreciated that the queued I/O access operations and the direct I/O access operations typically manipulate user data.
In the preferred embodiment of the present invention, the I/O operations that may benefit from the introduced I/O access operations are identified. More particularly, I/O access commands in the application that are within a programmatic loop and that are asynchronous direct I/O commands are identified. That is, the present invention identifies loops in the ordered computer code of the application that generate uninterrupted sequences of asynchronous I/O requests for which the associated waits are not executed until after execution of the loop completes. Such uninterrupted sequences of asynchronous I/O requests are commonly found in loops that are associated with applications that handle buffer flushing. While the preferred embodiment of the present invention operates on I/O access commands that are within a programmatic loop, uninterrupted sequences of asynchronous I/O requests may alternatively be located in other programmatic constructs.
The preferred embodiment then combines, by chaining, the multiple asynchronous direct I/O requests into a much smaller number of disk I/O requests than would otherwise be executed. Those skilled in the art will appreciate that asynchronous I/O requests are typically not followed immediately by a wait request and may be aggressively scheduled for disk I/O operations by techniques such as chaining.
Therefore, the preferred embodiment of the present invention operates most efficiently in a computer system that supports chaining of multiple I/O requests into a single I/O request, such as the OS/390. For example, chained I/O disk requests may be aggregated so that multiple non-contiguous blocks of four thousand ninety-six bytes of information are processed by a single, chained I/O disk request. Execution time for characteristic test loads managed by the present invention is reduced by as much as 60 percent as compared to queued I/O operations on the OS/390 UNIX HFS that does not support combining multiple direct asynchronous I/O requests.
Also, certain queued I/O operations that occur prior to a loop are identified. That is, on UNIX systems a file may be initially opened for queued disk I/O access then closed and reopened for direct disk I/O access. The preferred embodiment of the present invention identifies such queued disk I/O access operations and converts them to direct I/O access operations where appropriate.
The preferred embodiment of the present invention also identifies a terminus point that is located subsequent to the programmatic loop. When the terminus point is reached, any remaining identified asynchronous direct I/O requests are combined by chaining and the last, possibly partially full, block of chained I/O requests is submitted.
In the preferred embodiment of the present invention, the I/O access requests made by the application, which are associated with general-purpose files, are replaced with direct I/O commands that are associated with high-performance files that support direct I/O access. Typically, when the application program code is executed, an I/O access request is transmitted to the general-purpose file system. In an embodiment of the present invention, application-directed I/O access of OS/390 UNIX HFS files via queued I/O commands may be redirected for direct I/O access to VSAM files. The general-purpose files may be OS/390 UNIX HFS files and the performance files may be VSAM files. The Virtual Storage Access Method (VSAM) is an access method for direct or sequential processing of fixed-length and varying-length records on disks.
More particularly, an embodiment of the present invention may operate by use of a high-performance improvement code module that accepts lists of buffer addresses and disk addresses, data length values, and aggregation_indicator flags, and issues direct I/O requests instead of queued I/O requests. Without this invention, such direct I/O requests would otherwise be converted to queued I/O requests. For example, on the OS/390 a database application may issue direct I/O access requests during flushing operations in its I/O cache. Transmission of the data associated with the VSAM file may be enabled by use of the buffer address that is the location of the data in computer memory, the disk address that is the location of the data on a disk, the data length value, and the aggregation_indicator flag. Examples of operations that transmit data to and from a file include reading from a file or writing to a file.
Additionally, the preferred embodiment maintains a xe2x80x9cperformance_namexe2x80x9d file that contains the name of the associated high-performance file which can be accessed with direct I/O commands. For example, the performance_name file may be an OS/390 HFS file that contains the name of an associated VSAM file. Therefore, an association is created between the OS/390 HFS file that would have been used if direct I/O were supported by OS/390 HFS and the VSAM file that is used in its stead. For example, an embodiment of the present invention converts what would otherwise be the queued I/O requests generated during execution of the application code with direct I/O access commands that manipulate the VSAM files by associating the I/O command directed to an OS/390 UNIX HFS file with a direct I/O command to the VSAM file.
While on most UNIX platforms general-purpose files support direct I/O access, the target computer system may lack such support. By creating an association between such general-purpose files and the performance files that support direct I/O access commands, database administrators may continue accessing some of the information about the general-purpose files while accessing a disk by direct disk I/O access. Therefore, this reduces the amount of application program code that must be changed to accommodate the computer code introduced in the preferred embodiment of the present invention. This also maintains ease of use for the application users since the translation between the general-purpose files and the performance files is typically transparent to the user. For example by relying on the association between the general-purpose files and the performance files, the computer programs that rely on information in OS/390 UNIX HFS files to determine characteristics of a file, such as whether the file exists, do not have to be altered to be made aware of the existence of the VSAM files.
An embodiment of the present invention improves the performance of computer-implemented I/O operations for complex applications, such as a database. More particularly, applications which use asynchronous, direct I/O commands that are ported to target computer systems which do not support such commands may be enhanced by the present invention to improve I/O performance. That is, the present invention may be implemented by augmenting general-purpose I/O access features with specialized I/O access operations that are tailored to enhance I/O access performance for complex applications, such as a database.
Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.