The present invention relates to data storage systems, and more specifically, this invention relates to efficiently moving object data in a storage hierarchy.
Cloud storage is primarily comprised of object storage at a massive scale. Object storage provides very little control over optimizing retrieval of large amounts of data. For example, OpenStack Swift only provides for bulk delete of containers, or of accessing data in a single container. This may be a problem because, at the massive scale, such data may be stored hierarchically across multiple storage tiers, such as flash storage, disk storage, and tape storage. For example, data that hasn't been read for a few hours may be moved from flash storage to disk storage, and data that hasn't been read for a few weeks may be moved to tape storage. Each of these storage tiers has different performance latencies, throughput characteristics, and cost points.
Object storage services do not provide a way to specify object movement. For example, object storage services do not provide a way to expedite the bulk movement of data from the lowest cost storage tier. Furthermore, when an application needs to immediately retrieve a large amount of data, the data may reside on the slowest storage tier, such as tape. Still yet, if the data is on tape as a part of a near-line object storage service, the object storage service may hide the mapping of objects to tape, making it impossible for the user of the object storage service to know how to optimally recall those objects. If the application requests a sparse data set across many tapes randomly, the recall time for the data set may be orders of magnitude slower than if the data set has been requested optimally.
Similarly, when reading large numbers of blocks of data out of object storage systems, the read may be performed in a non-optimal way. For example, multiple copies of the data may be put out on disk in a way that can exaggerate seek times and negatively impact performance. Additionally, requests may come in for each block, file, or object individually, and each request may be handled individually with no attempt to re-order requests to minimize read seeks.
Finally, for large scale-out storage solutions, load balancing across several nodes may help handle the high bandwidth needs of such services. These services may be designed to be multi-tenant and scale to very large sizes, sometimes with global geographic distribution of a service. Load balancing may require that all data enter through a single point before being distributed to the node actually processing the data.