More and more, the emerging model of enterprise computing is one where compute and storage resources are distributed globally. To derive most benefit from the investment in the infrastructure, resources are preferred to be consolidated into one pool. Users may then be able to just run an application on the pool, without needing to consider how and where the resources were actually derived from. However, to make best use of the available resources, the system should be able to make efficient allocation decisions, such as deciding where an application is run, where some database is stored, or how much bandwidth is allocated on some network for one application.
Users may generate workloads where each workload is an application with computational and storage requirements. Each of these workloads may be assigned to a compute server to perform the required computations. The data that the workloads access may also be assigned storage servers from which the compute server accesses the data.
However, costs are incurred in such environments. For instance, the cost of running a workload on a server could be measured in the amount of time the application used the server, the cost of locating a piece of data on a storage server could be measured in the amount of storage space used, or the cost of using a network link could be measured in the amount of bandwidth that was consumed. The problem then becomes matching workloads with the appropriate resources to minimize costs.
One approach to allocating resources is the storage configuration approach, as described by Alvarez et al., “Minerva: An Automated Resource Provisioning Tool for Large-Scale Storage Systems,” ACM Transactions on Computer Systems, 2001 and Anderson et al., “Hippodrome: Running Circles Around the Storage Administrator,” Conference on File and Storage Technologies, 2002. The storage configuration approach involves placing data onto storage devices subject to capacity and utilization constraints, while minimizing the cost of the storage devices.
However, the storage configuration approach assumes computation to be local to the storage and is separately assigned. In particular, there is no network latency between computation and storage. Thus, the storage configuration approach is not suitable when modeling behavior of a system whose resources are distributed.
There have been other works that attempt to solve variants of the data placement problems such as the file assignment approach, web object placement approach, and the web cache placement approach. However, these approaches had many deficiencies such as inability to explore load-balancing issues, computational inefficiency, lack of provable quality solution and/or performance, and the like.