A storage system is configured by various types of storage devices. The reason is that each storage device has a unique characteristic and, therefore, it is efficient to combine the characteristics of the storage devices in order to increase performance and reliability of the entire system.
In general, high-speed storage is small in capacity while large-capacity storage is low in speed. How these types of storage are combined to provide large-capacity, high-speed, and high-reliability storage is an important subject of a storage system.
A high-speed and small-capacity storage includes for example, a semiconductor memory such as a RAM(Random Access Memory). A low-speed and large-capacity storage includes a hard disk (HDD). In general, data is saved on a hard disk. A memory (RAM) is used as a cache (disk cache) because the storage capacity of the memory is much smaller than that of a disk and because the memory is volatile (data is lost when the power is off). Data is stored on a hard disk and is accessed by reading from, or written to, the hard disk with an address on the hard disk (block address) specified. The input/output access speed of a hard disk is slower than that of a memory and so the processing becomes slow if access is made to the hard disk each time data is required. To address this problem, a memory (disk cache) is provided between the hard disk and a host to store once-accessed data not only on the hard disk but also in the memory. This configuration allows data in the memory (cache) to be returned without accessing the hard disk if access is made to the same address again, thus speeding up the access. This is how the cache works.
In addition, many ideas have conventionally been studied and implemented considering the characteristics of a hard disk to increase the data access speed; for example, data is perfected or is stored as a sequential log at write time.
As described above, a storage system has been built on the premise that the capacity of a memory is significantly smaller than that of a hard disk and that the amount of data required for processing is larger than the amount of a memory.
Recently, however, an increase memory capacity has an effect on this premise. For example, a latest product, even a commodity server, can have 100 GB or more of memory (DRAM (Dynamic Random Access Memory)) per unit.
In addition, if the memory of several servers can be shared among clusters by means of the distributed shared memory technology, the memory capacity is on the order of TB(terabytes) that enables enough capacity to be allocated to most processing. Products are available today that take advantage of this situation to perform all processing in memory. Among known storage products that perform all processing in memory are “Oracle Coherence”, an in-memory product from Oracle corporation, and open-source system “memcached” (used, for example, in the distributed memory cache system at major websites). Attention has been paid also to an on-memory DB (memory DB) that enables high-speed data processing without disk access during update and search. These products implement high-speed data processing on the premise that data required for the processing is all in memory.
On the other hand, there has been increasing interest today in key-value data store (“Amazon Dynamo” from Amazom.com Inc., “memchached”, etc.). The key-value distributed data store is characterized in that a memory map is not shared, a value is variable-length, a client-server model is applicable, and the distribution method is determined by the client side. The key-value distributed data store is advantageous in that there is no need for storage servers to cooperate with each other and scalable expansion is easy (The key-value data store will be described later in the exemplary embodiment of the present invention).