This invention relates generally to the field of digital computer systems, and more specifically to file systems for use in such digital computer systems. The invention specifically provides an arrangement that intelligently uses a host cache in a host computer, that is, switching caching by a host computer""s file system on or off based on static application profiles and dynamic input/output patterns, and utilizing the caching by a mass storage subsystem connected to the host computer. Disabling file system caching in a host computer can enhance input/output throughput and other host performance characteristics since that can eliminate extra processes and expenses that are associated with generating and storing additional copies of data that may occur if caching is performed both by the mass storage subsystem and the host computer""s file system.
Digital computers store information, including data and programs for processing the data, in the form of files. Typically the files are stored in a mass storage subsystem, in which the information is stored in, for example one or more disk storage devices or other device(s) in which information can be stored in a long-term basis. When a computer is to execute a program, which may be either an application program or a program that forms part of the operating system, at least some portion of the file or files that contain the program are read from the mass storage subsystem in which they are stored, and provided to the computer for execution. Similarly, when a program needs data for processing, at least some portion of the file or files containing the data are read from the mass storage subsystem in which they are stored and provided to the computer for processing by the program. While the program is processing the data, it may generate processed data that can be transferred by the computer to the mass storage subsystem for storage. The processed data may be stored in a pre-existing file, or a new file may be created to store the data. Similarly, while a program is being executed, it may generate status or other information that may be transferred by the computer to the mass storage subsystems for storage in either a pre-existing file or a new file.
Disk storage devices store information in storage locations, with each storage location being capable of storing a selected amount of information. Typically, a computer provides a file system, which comprises a portion of its operating system, that actually identifies the storage locations in the disk storage units in which the files are actually stored, which relieves programs of the necessity of knowing the particular storage locations on the disk storage devices in which their files are stored. When information is to be read for a program, an input/output read request is issued to the file system identifying the file and the portion of the file whose data is to be read. In addition, the input/output read request can provide a pointer to a buffer, which may be a temporary buffer, in which the data is to be stored by the file system for use by the program. In response to the input/output read request, the file system will initially determine whether the requested data is in a cache that it maintains. If the requested data is in the file system""s cache, the file system will copy the data from the file system cache to the buffer, thereby to provide it to the requesting program.
On the other hand, if the file system determines that the requested data is not in the file system cache, it will identify the disk storage device(s) and storage locations thereof on which the requested data is stored, and issue a read request to the disk storage devices, which identifies the storage locations from which information is to be read. The information to be read will generally include the information requested by the program and, in a xe2x80x9cread aheadxe2x80x9d technique, may also include other information that was not requested by the program, but which is proximate the requested information in the file. The read request provided by the file system to the disk storage devices will identify the storage locations on the disk storage devices from which the information is to be retrieved. Typically, during a read operation, the contents of entire storage location(s) will be read, even if the information that is to be provided in response to the input/output read request is a subset of the information that is stored in the storage location(s). After the disk storage devices have provided the information requested by the file system to the file system, the file system will cache the information in its file system cache. In addition, the file system will copy the information that was requested in the original input/output read request to the buffer pointed to by the input/output read request. The file system can thereafter notify the program that the input/output retrieval operation has been completed, after which the program can make use of the retrieved information. It will be appreciated that, if more information was read than had been requested by the program and stored in the cache, if the program later issues an input/output read request for the additional information, the additional information may be in the file system cache, in which case the file system will be able to satisfy the input/output read request from the file system cache.
Similarly, when data from a program is to be written, the program issues an input/output write request to the file system, the write request identifying the file, the portion of the file in which the data is to be written, and the data that is to be written in the identified portion. The data that is to be stored may be stored in a buffer, and the program can identify the data to be stored by providing a pointer to the buffer containing the data. In response to the input/output write request, the file system identifies the disk storage devices and storage locations thereon on which the data is to be stored. Essentially, the file system will perform a storage operation in three phases. In the first phase, if the contents of the storage location(s) in which the data to be stored are not already in the file system cache, the file system will enable them to be retrieved and stored in the file system cache in the same manner as during a read operation described above. After the contents of the storage location(s) have been stored in the file system cache, the file system will update the contents as stored in the file system cache with the data to be stored. At some point later, the file system can enable the updated cached contents to be copied to the disk storage devices for storage. While the updated cached contents are in the file system cache, the file system can satisfy input/output read requests issued by programs for the data from the file system cache.
The use of file system caching can be advantageous particularly in connection with programs whose input/output profiles are such that data to be read is likely to be stored the file system cache. While some programs have such input/output profiles and may benefit from caching of data read from disk storage devices, other programs do not, and caching by the file system for such programs may be a waste of the host computer""s memory that is provided for such caching, as well as processor capacity that may be consumed to perform the caching. In addition, a number of modern mass storage subsystems include large caches in which information is stored during both read and write operations. This, combined with the fact, in modern information transfer systems, such as FibreChannel, InfiniBand, and the like, that can be used to connect host computers to mass storage subsystems, information can be transferred very rapidly, means that information can be quickly transferred from the mass storage subsystem""s cache to the host computer.
The invention provides a new and improved system and method that intelligently uses a host file system cache in a host computer, that is, switching caching by a host computer""s file system on or off based on static application input/output profiles and dynamic input/output patterns, and utilizing caching provided by a mass storage subsystem that is connected to the host computer.
In brief summary, the invention provides an arrangement for use in connection with a host computer connected to a mass storage subsystem, the mass storage subsystem storing information for use in connection with processing of at least one program by the host computer. The arrangement comprises a program input/output interface, a mass storage subsystem interface and a file system control. The program input/output interface is configured to receive program input/output read and write requests from a program, each program input/output read and write request initiating an input/output operation in connection with information stored on a mass storage subsystem. The mass storage subsystem interface is configured to facilitate communications with the mass storage subsystem, including transferring a storage subsystem input/output read and write request thereto and receiving information therefrom. The file system control is configured to, in response to a program input/output read or write request received by the program input/output interface, generate a storage subsystem input/output read or write request for transmission by the mass storage subsystem interface to the mass storage subsystem and to transfer information to be transferred during the input/output operation between the program input/output interface and the mass storage subsystem interface, and to selectively cache the information in a file system cache maintained by the host computer.
The file system control can be configured to control caching of the information in the file system cache based on any of a number of criteria, including, for example, static application profiles and dynamic input/output patterns, such as the amount of information to be transferred between the program and the mass storage subsystem during the input/output operation, the type of program that provided the program input/output request, and any of a number of other criteria.
Disabling file system caching in a host computer can enhance input/output throughput and other host performance characteristics since that can eliminate extra processes and expenses that are associated with additional copies of data that may occur if caching is performed both by the mass storage subsystem and the host computer""s file system.