Various technologies relating to controlling data access to a storage device and storage system are known.
For example, a data store system composed of a single or a plurality of computers is available (e.g., database system, file system, and cache system). In recent years, a distributed storage system has often been applied to such systems. Such distributed storage system includes a plurality of general-purpose computers connected with one another via a network.
Such distributed storage system uses memory devices installed on these computers to store and supply data. The memory device may be, for example, a hard disk drive (HDD) or main memory (such as dynamic random-access memory (DRAM)).
In a distributed storage system as described above, on which computer the data is to be put and by which computer the data is to be processed are determined by software or special hardware. This architecture is called a shared nothing architecture.
In a storage area network (SAN), a memory device is shared among, for example, a plurality of servers via a network such as Fibre Channel (FC). For example, a data store system may be implemented by using a memory device shared in a SAN.
To implement a system where data is shared among a plurality of computers in a SAN, it is necessary to use software that is based on a shared everything architecture. For example, to implement a file system, the software may be SAN File System or the like. To implement a database system, the software may be, for example, Oracle® RAC (Real Application Clusters®).
A shared everything architecture is typically implemented by using FC or iSCSI (Internet Small Computer System Interface). FC and iSCSI involve significant communication delays. Thus, memory devices of excellent response performance are rarely used, and instead memory devices of lower response performance such as an HDD are mainly used.
On the other hand, HDDs are excellent in sequential access performance. For this reason, in database or other software pieces, techniques such as a write-ahead log are used to sequentially write only update information to compensate for the low performance of a shared memory device.
Recently available configurations include the one where a computer is connected to a high-speed memory device, such as a solid-state drive (SSD), via Peripheral Component Interconnect-Express (PCI-e), which is a high-speed and general-purpose interface. Such configuration allows for accessing a high-speed memory device with low latency. Thus, such configuration is used for applications such as caching for a storage on a SAN.
A shared everything architecture can be implemented by using a technology for sharing a PCI-e device in such configuration among a plurality of hosts with, for example, ExpEther®. With this configuration, a shared everything architecture achieves storage sharing with lower latency than the above-described storage on a SAN.
As described above, cluster-based distributed storage or distributed database technologies have been developed predicated on a shared nothing or shared storage architecture, which is a server-based architecture in the related art. To access resources belonging to another server, a server architecture in the related art conventionally necessitates accessing through the relevant server. On the other hand, a resource disaggregated architecture does not always necessitate going through a server to access resources because individual resources (memory, storage, and the like) are connected via an interconnected network and thus every resource can be physically shared by each central processing unit (CPU). As seen above, changes in server architectures have transformed distributed storage or distributed database technologies.
For example, for exclusive control of data, distributed systems in the related art use the two-phase or three-phase commit protocol or the Paxos algorithm to communicate between servers so as to implement the exclusive control. Additionally, aforementioned Oracle RAC uses a function called Cache Fusion to implement the exclusive control. A distributed key-value store uses a hash function to distribute and determine responsible nodes, and these nodes manage data in units of exclusion (for example, in units of record). For example, a transaction spanning a plurality of records needs to use the above-mentioned two-phase commit or the like between nodes to implement exclusive control.
These are designed with the assumption that communications between servers take a longer time than communications with resources within a server. Concerning the relationship between a server-to-resource communication time and a server-to-server (CPU-to-CPU) communication time, a resource disaggregated architecture presumes that server-to-server (CPU-to-CPU) communications do not always predominate but are equal (or less dominant). Thus, for exclusive control on a system running on a resource disaggregated architecture, such as a distributed storage database system, server-to-server communications need to be reduced and functions available on the resource side (limited hardware) can be utilized.
PTL 1 discloses a technique for managing an update to data by dividing a database object, such as a table or index, into fragments, duplicating (cloning) each fragment, and using the cloned fragments.
PTL 2 discloses a technique for exclusively performing an updating process from a plurality of computers by using a Compare and Swap (CAS) command in a configuration where a Key-Value Store (KVS) system with a function to handle the CAS command is shared.