The present invention relates generally to the field of object storage architecture, and more particularly to using checkpoints in an embedded computation infrastructure built in object storage.
Traditional object storage architecture comprises two entities/node groups: proxy nodes, which are used for distributing the load and request handling; and storage nodes which are responsible for writing in to the disks/storage subsystems. Traditional object storage architecture serves purely as a storage unit and repository, and in order for analysis of the data residing in these storage units (i.e., extracting meaningful information from raw data), it requires an additional client or compute node.
Storlet architecture (i.e., embedded compute infrastructure built-in object storage) comprises a software engine present within the nodes. The end user must frame the computation algorithm and must deploy or pass it to this engine as a normal object PUT operation. Storlet architecture does not require any additional client or compute node to perform analysis of the data, but rather the proxy/storage node itself acts as a compute node and returns the results back to the user. Storlet architecture uses virtual machines (VM) (i.e., Linux Containers, Dockers, KVM, ZeroVM, etc.) deployed on the nodes to perform the computation tasks.
Checkpointing is a technique to add fault tolerance into computing systems. It basically consists of saving a snapshot of the application's state, so that it can restart from that point in case of failure. This is particularly important for long running applications that are executed in vulnerable computing systems. In distributed computing, checkpointing is a technique that helps tolerate failures that would otherwise force long-running applications to restart from the beginning. The most basic way to implement checkpointing, is to stop the application, copy all the required data from the memory to reliable storage (e.g., Parallel file system), and then continue with the execution. Checkpointing implementations should preserve system consistency.