The present disclosure relates generally to cloud computing, and more particularly to a massively scalable object storage system to provide storage for a cloud computing environment, particularly with regard to the storage and modification of large files.
Cloud computing is a relatively new technology but it enables elasticity or the ability to dynamically adjust compute capacity. The compute capacity can be increased or decreased by adjusting the number of processing units (cores) allocated to a given instance of a processing module (server or node) or by adjusting the overall quantity of processing modules in a system. Cloud computing systems such as OpenStack abstract the management layer of a cloud and allow clients to implement hypervisor agnostic processing modules.
As the use of cloud computing has grown, cloud service providers such as Rackspace Hosting Inc. of San Antonio, Tex., have been confronted with the need to greatly expand file storage capabilities rapidly while making such expansions seamless to their users. Conventional file storage systems and methods to expand such systems suffer from several limitations that can jeopardize data stored in the object storage system. In addition, known techniques use up substantial resources of the object storage system to accomplish expansion while also ensuring data safety. Finally, the centralization of data storage brings with it issues of scale. A typical local storage system (such as the hard drive in a computer) may store thousands or millions of individual files for a single user. A cloud-computing-based storage system is designed to address the needs of thousands or millions of different users simultaneously, with corresponding increases in the number of files stored.
An increasingly common use of cloud computing is computations on so-called “big data”—datasets that are much larger than memory and are frequently much larger than the available disk space on any particular computer. Current datasets can be so large that they become difficult to store and process, and the storage and processing of large datasets is only set to increase over time. Depending on the type of data, this may involve datasets that are terabytes, exabytes or zettabytes in size. Adding to the complication, efficient dataset processing may require random (as opposed to sequential) access. Applications of large dataset processing include meteorology, genomics, economics, physics, biological and environmental research, Internet search, finance, business informatics, and sociological analysis. Information technology and security organizations also may generate extensive activity logs requiring massive amounts of storage.
Accordingly, it is desirable to provide an improved scalable object storage system with support for large object processing and storage.