The present disclosure relates generally to cloud computing, and more particularly to a massively scalable object storage system to provide storage for a cloud computing environment with a level of indirection useful for storing large objects as well as different variants of the same object.
Cloud computing is a relatively new technology but it enables elasticity or the ability to dynamically adjust compute capacity. The compute capacity can be increased or decreased by adjusting the number of processing units (cores) allocated to a given instance of a processing module (server or node) or by adjusting the overall quantity of processing modules in a system. Cloud computing systems such as OpenStack abstract the management layer of a cloud and allow clients to implement hypervisor agnostic processing modules.
As the use of cloud computing has grown, cloud service providers such as Rackspace Hosting Inc. of San Antonio, Tex., have been confronted with the need to greatly expand file storage capabilities rapidly while making such expansions seamless to their users. Conventional file storage systems and methods to expand such systems suffer from several limitations that can jeopardize data stored in the object storage system. In addition, known techniques use up substantial resources of the object storage system to accomplish expansion while also ensuring data safety. Finally, the centralization of data storage brings with it issues of scale. A typical local storage system (such as the hard drive in a computer) may store thousands or millions of individual files for a single user. A cloud-computing-based storage system is designed to address the needs of thousands or millions of different users simultaneously, with corresponding increases in the number of files stored.
An increasingly common use of cloud computing is computations on so-called “big data” —datasets that are much larger than memory and are frequently much larger than the available disk space on any particular computer. Current datasets can be so large that they become difficult to store and process, and the storage and processing of large datasets is only set to increase over time. Depending on the type of data, this may involve datasets that are terabytes, exabytes or zettabytes in size. Adding to the complication, efficient dataset processing may require random (as opposed to sequential) access. Applications of large dataset processing include meteorology, genomics, economics, physics, biological and environmental research, Internet search, finance, business informatics, and sociological analysis. Information technology and security organizations also may generate extensive activity logs requiring massive amounts of storage.
Clients of such a data storage system often require views of their data in specific formats. For small data sets, it is relatively trivial for either the system or the client to perform a conversion of the data on the fly. However, with large data sets such an approach is not feasible. Also, even if such a conversion is feasible it requires extra computing power to perform.
Accordingly, it is desirable to provide an improved scalable object storage system with support for the storage and retrieval of multiple variants of a single object. Further, it is desirable to apply the similar techniques for storing large objects as a series of object segments rather than as a contiguous block of sequential data. The concept of indirection can be used in a cloud computing system to provide these and other advantages in a cloud computing system, as described below.