Existing cloud services provide developers storage services with simple, data-centric interfaces to store and retrieve data items. Behind such simple interfaces, these services use complex machinery to ensure that data is available and persistent in the face of network and node failures. As a result, developers can focus on application functionality without having to reason about complex failure scenarios.
Unfortunately, this simplicity comes at a cost as applications have little or no information regarding the location of data items in the network. Without this information, applications cannot optimize their execution by moving computation closer to the data items, data items closer to users, or related data items closer to one another. These kinds of optimizations can be crucial for applications executing across different data centers (where network latencies can be very high), as well as within hierarchical data center networks (where bandwidth can be limited).
Current solutions involve guesswork. For example, the cloud service may determine a location for the storage of a data item for an application by predicting the future access patterns of the application based on past history. This approach can be expensive and counter-productive, since the application typically has more accurate information than the cloud about its own future behavior. In addition, without input from the application, the cloud service can optimize only simple aggregates of low-level metrics such as bandwidth usage.