1. Technical Field
This disclosure generally relates to computer storage systems, and more specifically relates to a system and method for dynamically transitioning the file system role of compute nodes for provisioning a storage object with an embedded compute engine or “storlet”.
2. Background Art
Traditional computer file systems store information in a database such as in a tree structure. These traditional systems worked fine for small collections of data like those on a local hard drive, but they were not designed for the massive volumes of unstructured content like that used by many businesses that are collecting, storing, and accessing data locally and in the cloud. A newer method of storing information is called object storage. In this approach, information is stored as objects. Each object contains the data (the bits and bytes of our documents, movies, images, and so forth), together with metadata that holds user and system defined tags. The metadata describes the content of the data, how the object is related to other objects, how the data should be handled, replicated, or backed up, etc.
Another type of object storage was developed to increase the value of object storage and increase the speed at which the data in the objects can be accessed. This type of object storage has an embedded compute engine within the storage object and is typically called a “storlet”. Thus a storlet is a data storage object with a computational component stored inside. A storlet allows object storage to move the ability to perform computations to the data, instead of the system having to move the data to a compute node or server to carry out the computation. The storlet can be provisioned or deployed on a compute node for execution by the compute node. When the storlet is executed, the efficiency of executing the storlet on the compute node depends on the type of operation the storlet performs and the capabilities and role of the compute node.
Storage systems may seek to use clustered file systems for object storage. This brings the advantages of a clustered file system such as backup, replication, consistency, locking, better metadata handling, etc. into the object storage architecture. However, problems arise when bringing storlet execution into a clustered file system that uses object storage. These problems include excessive communication between the nodes and excessive interlocking between the nodes in order to maintain consistency.
A clustered file system involves multiple nodes across a network. According to the distributed nature of its architecture, some of the nodes participating in the cluster would need to perform additional roles/functions (such as admin node, application node, client node, metadata node, token manager node, WAN caching gateway node, quorum node etc.). And by virtue of the roles performed by the node, some computation operations performed on these nodes are faster depending on the role performed by the node. For example, a storlet with a computational algorithm might involve repetitive operations on each object. In image processing each object may need to read/write multiple times at each pass. Each time the storlet container starts accessing the object, the node will request the token manager node to verify whether this object is locked by any other application or not. Thus this computation algorithm may generate a lot of token traffic between nodes where communication is required between the executing storlet and the node which handles token management for the storlet. In this example, if the storlet invocation containing this computational algorithm is made on a node other than a token manager node, it results in an excessive amount of communication (RPC calls) between the nodes.
Prior art storlet systems and architectures do not have the ability or intelligence to dynamically assign or change the filesystem role of a particular node to a role which is optimal for fulfilling a specific type of computation algorithm. Problems created by this lack of intelligence includes a tremendous increase in disk I/O operations, loading on the file system and degradation of storage unit performance and a reduced life span of disk drives.