Technical Field
The present disclosure relates to object stores and, more specifically, to an architecture of an object store deployed in a distributed data processing system.
Background Information
In many current analytics frameworks, distributed data processing systems may be used to process and analyze large datasets. An example of such a framework is Hadoop, which provides data storage services to clients using a distributed file system and data processing services though a cluster of commodity computers or nodes. The distributed file system, e.g., the Hadoop Distributed File System (HDFS), executes on the cluster of nodes to enable client access to the data in the form of logical constructs organized as blocks, e.g., HDFS blocks. Each node of the cluster typically has its own private (i.e., shared-nothing) storage and employs a native file system, such as ext3/4 or XFS. The native file system typically has a plurality of features directed to management of the data in the form of logical constructs organized as files. As a result, the distributed file system may be employed to access the data as blocks, while the native file systems executing on the cluster of nodes may be employed to store and process the blocks as one or more files.
Often it may be desirable to avoid the use of a native file system in certain deployments of distributed data processing systems because many of the features provided by the native file system may not be required. For example, a feature of a native file system is its compliance with the Portable Operating System Interface (POSIX) standard, which requires exposing file handles to clients to enable access, e.g., reading and writing, to files in accordance with a full set of operations. The distributed data processing system may not require POSIX compliance because there may only be a limited set of operations, such as open, read and verify checksum, needed by the distributed file system to access blocks. Thus, the overhead associated with the many features provided by a native file system may not be appropriate for the distributed data processing system deployment.
Accordingly, it may be desirable to provide a storage solution to distributed data processing systems that eliminates the overhead associated with native file systems. In addition, it may be desirable to provide a generic storage solution that may be deployed in distributed data processing systems that employ data management systems, such as distributed file systems and distributed database management systems.