1. Field of the Invention
The present invention relates to a technology that predicts data that, of data stored in storage devices, will be accessed in the near future by a computer and that prefetches the data into a cache memory.
2. Related Background Art
At present, there are numerous applications that use data bases (hereafter called “DBs”), and this has made DB management systems (hereinafter called “DBMS”), which are software that perform a series of processing and management regarding DBs, extremely important. One of the characteristics of DBs is that they handle a huge amount of data. For this reason, a common mode for many systems on which a DBMS operates is one in which storage devices with large storage capacity disks are connected to a computer on which the DBMS runs and DB data are stored on the disks. As long as data are stored on the disks of the storage devices, accesses to disks must necessarily be made when performing any processing that concerns the DB. Accesses to disks accompany mechanical operations such as data seek, which require far more time than calculations and data transfer that take place in the CPU or memory within a computer. In view of this, the mainstream storage devices reduce data accesses to disks by providing a cache memory within storage devices, retaining the data read from the disks in the cache memory, and reading data from the cache memory when accessing the same data. In addition, a technology is in development to predict data that will be read in the near future based on a series of data accesses and to prefetch the predicted data into the cache memory.
In view of the above, U.S. Pat. No. 5,317,727 describes a technology that improves the performance of DBMS by reducing unnecessary accesses and prefetching required data. In this technology, in a section that executes a query from a user, an execution plan for the query, data access properties, cache memory volume, and I/O load are taken into consideration in order to execute prefetching, determine the prefetching volume, and to manage cache (buffer), thereby improving the I/O access performance to improve the performance of the DBMS.
Kagehiro Mukai, et al. “Evaluation of Prefetching Mechanism That Uses an Access Plan in Highly Functional Disks.” 11th Data Engineering Workshop DEWS 2000 Collection of Essays, Specialized Committee on Data Engineering Research, Electronic Information Communication Academy, July 2000, Seminar No. 3B-3 describe a DB that uses a relational data base management system (RDBMS) for improving the performance of DBMS through highly functional storage devices. When an execution plan for a query processing in an RDBMS is provided to a storage device as application-level knowledge, the storage device, after reading an index of a certain table in the RDBMS, becomes capable of determining which blocks that store the data corresponding to the table should be accessed. As a result, the storage device can then consecutively access indices to ascertain groups of blocks that retain the data of tables that should be accessed based on the indices, and by effectively scheduling accesses to those blocks, shorten the total access time to the data. This processing can be executed independently of a computer on which the DBMS is executed, so that there is no need to wait for commands from the computer. Furthermore, when data is divided among a plurality of physical storage devices, each of the physical storage devices can be accessed in parallel, which can further shorten the execution time for DBMS's processing.
R. H. Patterson, et al. “Informed Prefetching and Caching.” Proc. of the 15th ACM Symposium on Operating System Principles, December 1995: 79–95 discusses a function, as well as its control method, of a computer's OS to prefetch data into a file cache in the computer by using clues issued by applications concerning files and access destination regions that would be accessed in the near future.
In the meantime, the amount of data stored in storage devices, foremost among them RAID, has grown significantly in recent years, and the storage capacity of the storage devices themselves has increased; at the same time, the number of storage devices and file servers connected to networks has also seen a rise. As a result of this, however, various problems have arisen, such as increasingly complicated management of large capacity storage regions, increasingly complicated management of information equipment due to dispersed installation locations of storage devices and servers, and concentration of load on certain storage devices. Currently, a technology called virtualization is being researched and developed in order to solve these problems.
The virtualization technology is divided primarily into three types, as described in The Evaluator Series Virtualization of Disks Storage, WP-0007-1, September 2000 by Evaluator Group, Inc.
The first is a mode in which each server connected to a network shares information that manages storage regions of storage devices. Each server uses its volume manager to access the storage devices.
The second is a mode in which a virtualization server (hereinafter called a “management server”) manages as virtual storage regions all storage regions of storage devices connected to a network. The management server accepts access requests to the storage devices from servers, accesses storage regions of the subordinate storage devices, and sends the results as a reply to the request source server.
The third is a mode like the second in which a management server manages as virtual storage regions all storage regions of storage devices connected to a network. The management server accepts access requests to the storage devices from servers and sends as a reply to the request source server position information concerning storage regions that actually store the data to be accessed. The server then accesses the storage regions of the storage devices based on the position information.
In systems to which a virtualization technology is applied, a DBMS is used in an extremely high proportion due to the enormous amount of data that is handled. And due to the enormous amount of data that is accessed, the disk access performance within a storage device has a great impact on the performance of the entire system.
Utilizing cache memory and prefetching data in prior art described above are specialized technology within storage devices. Technologies in the first through third documents described above improve performance by linking the DBMS processing with storage devices, but they are not designed to work in the virtualization environment.