Recently, with advances in information communication technology and expansion of field of its application, a target to which an information processing system is applied and a processing amount dealt with in the system are increasing. In a information processing system, in order to cope with increasing processing amount and data volume, there has been employed such an architecture in which the information processing system is composed by a plurality of computers interconnected via a bus or a communication network so as to improve a system performance.
<Inter-Node Migration of Processing or VMs>
Since a load on a computer is varied in accordance with processing content and time, such a technique is used in which, in an information processing system made up of a plurality of computers (nodes), a process is migrated from one computer to another depending on a load to accomplish load distribution, and improve utilization efficiency of each computer and a system performance as a whole. When, for example, a processing load in a computer is increased to close to a performance limit, the computer executing the process is changed to another computer with a lighter load to improve utilization efficiency of computer resources and performance. Process migration may also be utilized in a virtualized system. For example, a virtualization mechanism, such as a hypervisor, is implemented on a server to virtualize hardware resources, such as a CPU (central processing unit) or a memory. A virtual OS (operating system) is run on this virtualization mechanism and an application is executed to implement a virtualization environment including virtual machines (VMs). In the virtualization environment, there are cases in which a VM is migrated to improve performance or utilization efficiency of computer resources. When a VM as a unit is migrated, such as when a VM that executes a process is changed to another VM, an image (VM image) of a system disk of the VM to be migrated is accessed from a computer that executes a VM after migration to implement hot migration of the VM. When migrating a VM from one computer (node) to another computer (node), by way of inter-node migration of the VM, status information of the VM to be migrated is copied to e.g., a computer which is a destination of the migration. The status information of the VM includes total memory information inclusive of network connection information and total information relating to processor (CPU), such as dump information of registers of the VM. Data referenced by a process, such as VM's status information or VM image, are stored in a storage shared by a plurality of computers.
<Storage Cache>
It is known that in an information processing system that includes a plurality of computers (nodes) and has a system configuration in which multiple nodes share a single storage, accesses to the shared storage may be a limiting factor of the system performance (a bottleneck). It is supposed that, though not limited thereto, a plurality of nodes are connected to the shared storage via a network or a bus. Representative of the bottlenecks are:                a load due to an access request to the shared storage, and        a network load that increases with increase in access requests.        
As is well-known, a cache (a storage cache) that temporarily holds a copy as well as an update of a part of data stored in a storage is used as a technique to reduce access requests to the storage and improve access performance. The storage cache is constituted by a DRAM (dynamic random access memory) or an SSD (solid state device, for example a NAND type non-volatile memory or flash memory), as an example, and allows faster access as compared with a case of accessing from a computer (node) to a storage. By storing in a cache a copy of data that has a high access frequency, access to data without access to the storage is enabled, or to reduce the number of times of accesses to the storage. Among data held in the cache, less accessed data are purged from the cache and copy of frequently accessed data are held in the cache. When data for an access request hits in a read access, the data that has so hit is returned as a response to a source of the access request. In a write request, data that has hit in the cache is updated and a write completion response is returned to the source of the write request. In the case of a write-back system, for example, afterwards, the update of data in the cache is reflected in the storage.
<Cache Coherency>
Even in a system in which a plurality of nodes share a common storage, cache access performance may be lowered due to deteriorated cache performance caused by increased accesses to the cache from a plurality of nodes or to increased loads on a network caused by increasing accesses from a plurality of nodes. Thus, such a configuration is used in which an individual cache (local cache/private cache) is provided in each of a plurality of nodes (see e.g., Non-Patent Literature 1). In this case, a mechanism for maintaining cache coherency is implemented in each cache of a plurality of nodes. Cache coherency means data consistency, which should be maintained so that, even in the case where each of a plurality of nodes has updated data, data accessed would be the latest correct data. In general, data stored in the shared storage can be referenced from a plurality of nodes. Assuming that identical data in a shared storage are stored in respective caches of one node and the other node, and the data held in the cache of the one node is updated by a process executed in the one node, it is necessary to reflect an update executed in the one node in the data held in the cache of the other node, or to invalidate the data held in the cache of the other node.
The following describes an example operation of a typical system of a relevant technique, including a storage shared by a plurality of nodes, each of which has a cache for temporarily holding a copy or an update of data stored in the shared storage, with reference to FIG. 18. A CPU of a node 101-1 writes update data in a cache (local cache) of the node 101-1 (1: update).
The CPU of the node 101-1 notifies a CPU of a node 101-2 a destination address (a cache line) of updated data by way of notification of update of the data (2: notification of update by address). It is noted that 2: notification of update by address may be made simultaneously as or directly before the update of data (write) in the cache of the node 101-1.
On reception of the update notification from the node 101-1, if the cache of the node 101-2 holds the data of the address (pre-update data), the node 101-2 invalidates the data (3: invalidate).
The CPU of the node 101-1 writes back the updated data of the cache of the node 101-1 to the shared storage 3 (4: write-back (reflect)).
When the CPU of the node 101-2 makes reference to the said data, since the said data in the cache of the node 101-2 has been invalidated (with an invalidate flag, not shown, being turned on, as an example), no cache hit occurs in the node 101-2 (5: miss). The CPU of the node 101-2 reads data that has reflected the update by the node 101-1, from the shared storage 3, and registers the data in the cache of the node 101-2 (6: read updated data). It is noted that, if, in storing the updated data from the shared storage 3 in the cache of the node 101-2, there is no sufficient vacant capacity left in the cache of the node 101-2, data invalidated or data with the least referencing frequency is purged from the cache to secure a free space in the cache.
When the node 101-2 makes reference to the said data, the CPU of the node 101-2 makes reference to the updated data registered in the cache of the node 101-2 (7: access).
<Example of the Management Information for the Storage Cache>
FIG. 19 schematically illustrates an example of management information for the cache shown in FIG. 18. The cache management information is managed by a cache controller, for example, not shown in FIG. 18, and is provided in a cache controller or in a memory area in each cache, different from a cache memory. Though not limited thereto, the cache management information includes an invalid flag, a page address, a storage location address, an updated flag, replacement priority and so forth. The invalid flag is a flag that indicates data of each page being invalid (Yes) or not invalid (No). Cache data in a cache entry with the invalid flag set to Yes is not read and written (update) and is preferentially purged from the cache when the cache data is replaced. The page address is a page address (logical address) in the entire storage corresponding to a page held in the cache. It is noted that a memory area of a storage cache is usually managed in terms of a page of a fixed page size, such as 4 KB (KiloByte) or 8 KB as a unit. The location address indicates an address in the cache into which the page (cache record) is stored. The updated flag indicates whether or not the page (cache record) in the cache has been updated after the data has been held in the cache. The updated flag indicates No in case data of the page in the cache has been updated but the update content has not yet been reflected (written back) in the storage, while indicating Yes in case after the data has been stored in the page in the cache, the data in the page has not yet been updated, or the update content has been reflected in the storage. The replacement priority is information indicating a degree of priority of replacement of a corresponding page in a cache. The degree of priority may be so set that, if the value of the replacement priority is positive, the larger the value of the extent of priority, the more preferentially the page is replaced. Or, the smaller the value of the extent of priority, the more preferentially the page may be replaced. By the way, in the cache management information of FIG. 19, the page address may be a path name (file name) in a file system implemented in the storage.
<MESI-Protocol>
In case a storage area is shared by a plurality of nodes, such a protocol that is to maintain cache coherency for caches that are distributed in nodes, such as MESI-protocol, is used. In a well-known manner, the MESI-protocol takes following states:
(1) Modified: The cache line (data) is present only in a pertinent cache and has been modified from data in a shared storage (dirty). A cache mechanism must write back this cache line (data) in the shared storage at some time before permitting another CPU to read the corresponding cache line (data) from the shared storage.
(2) Exclusive: The cache line (data) is present only in a pertinent cache but matches with data in the shared storage (clean).
(3) Shared: The same cache line (data) exists in another cache in the system and matches the data in the shared storage.
(4) Invalid: Indicate that this cache line (data) is invalid.
In the foregoing, ‘a cache for a shared storage’ is replaced a CPU cache that holds data of a main memory. For example, if a read access request is issued to a line (data) of an Invalid state, data must be read from the shared storage, and the line (data) transfers to the Shared or Exclusive state. It is only in the Modified state or in the Exclusive state that a write access to the cache can be performed. If the cache line is in the Shared state, another cache in the system, more precisely, a relevant cache line or data in such another cache, must be invalidated before writing in the cache. This is normally carried out by a broadcast operation which is termed ‘Read For Ownership (RFO)’. In the cache, a cache line (data) in a state different from the Modified state may be discarded at any time, changing the cache line to the Invalid state. It is noted that the cache line (data) in the Modified state has to be necessarily written back to the shared storage.
Non-Patent Literature 1: Paramarco, M. S.; Patel, J. H. (1984) “A low-overhead coherence solution for multiprocessors with private cache memories” Proceedings of the 11th annual international symposium on Computer architectur—ISCA '84. P. 348