Field
The disclosed embodiments generally relate to data storage systems. More specifically, the disclosed embodiments relate to the design of a data storage system that uses a remote procedure call (RPC) framework to facilitate efficient out-of-band bulk data transfers.
Related Art
Organizations are presently using cloud-based storage systems to store large volumes of data. These cloud-based storage systems are typically operated by hosting companies that maintain a sizable storage infrastructure, often comprising thousands of servers that that are sited in geographically distributed data centers. Customers typically buy or lease storage capacity from these hosting companies. In turn, the hosting companies provision storage resources according to the customers' requirements and enable the customers to access these storage resources.
Cloud-based storage systems often store sets of data items in large data objects called “extents” that can be many megabytes or even gigabytes in size. During operation, cloud-based storage systems often replicate or move these extents among different machines to facilitate fault tolerance or to provide high availability. Unfortunately, existing mechanisms for transferring data among machines, such as a remote procedure call (RPC) framework, are not well suited for transferring large data objects, such as extents. Using an RPC framework to transfer data items between machines typically involves performing a number of RPC-related operations, such as serializing and deserializing data items, making copies of data items, and performing associated encryption and validation operations. These RPC-related operations are relatively easy to perform on small data items, such as integers or short character strings. However, these RPC-related operations can be extremely time-consuming and memory-intensive for large data objects, such as extents. This makes it impractical to transfer such large data objects among machines using an RPC framework.
Hence, what is needed is a data storage system that facilitates efficiently transferring large data objects, such as extents, among machines.