What is called a storage system which stores data in a disk array including multiple magnetic disk drives is well known.
The storage system comprises disk arrays for storing data and a controller unit for controlling disk drives. The controller unit includes host interface units, disk interface units, cache memory, processors, and switch units connecting them. The storage system is connected to host computers via host interfaces and to disk arrays via disk interfaces.
In a storage system, host interfaces are used for connection to host computers. Furthermore, disk interfaces are used for connection to disk arrays. Cache memory is used for accelerating access from hosts to the storage system and for storing part of data in a disk array. Furthermore, switches are used for connecting the components in the storage system. Processors are used for controlling those components in the storage system.
In cache memory, data that has been accessed recently by the host computers is stored. If the cache memory capacity becomes insufficient, the least recently accessed data in the cache memory is replaced by recently accessed data. For this cache control, the LRU (Least Recently Used) algorithm is used frequently.
For such a storage system, normally, even after the start of the operation, additional resources can be installed if necessary. For example, if the capacity or the disk performance is insufficient, the problem can be solved by installing additional disk drives. Furthermore, if the control processor performance is insufficient, additional processors can be installed. Similarly, if the cache memory capacity is insufficient and the performance cannot be fully provided, additional cache memory can be installed.
As mentioned above, there are various methods for improving the storage system performance, though the effect of installing additional cache memory significantly varies depending on each operation of the storage system. For example, the following cases can be considered.
A case is suggested where 16 gigabytes of cache memory is installed in the storage system, the host computer accesses a range of 20 gigabytes sequentially from the head block, and after accessing the last block, returns to the head block and repeats access. In this case, as the cache memory has only 16 gigabytes of the latest data, the access from the host computer does not hit the cache memory. Meanwhile, when additional cache memory is installed to be 20 gigabytes from 16 gigabytes, the data to be accessed by the host stays in the cache memory and all the access becomes cache hits. For example, if additional cache memory is installed to be 32 gigabytes from 16 gigabytes, the cache hit rate changes from 0% to 100%.
Furthermore, as another example, a case is suggested where 16 gigabytes of cache memory is installed in the storage system, and the host computer accesses a range of 64 gigabytes randomly without any regularity. In this case, the probability of the accessed data existing in the cache memory amounts, with reference to the capacity ratio, to (16 GB/64 GB)=25%. At this time, if the cache memory is 32 gigabytes, similarly with reference to the capacity ratio, the cache hit rate amounts to (32 GB/64 GB)=50%. That is, if additional cache memory is installed, 32 gigabytes from 16 gigabytes in this example, the cache hit rate changes from 25% to 50%.
As is evident from the above-mentioned two examples, the effect of installing additional cache memory depends on the access patterns of the host computer. As a result, the determination on how much additional cache memory to install must be with reference to the estimation reflecting the access patterns of the host computer.
As for this point, for example, the Patent Document 1 discloses a mathematical method of estimating the cache hit rate with reference to the access patterns issued by the host computer and the cache memory capacity.    Patent Citation 1: U.S. Pat. No. 7,139,872B1