1. Field of the Invention
The present invention generally relates to computer software, and more particularly to the design and development of application management software for managing parallel communicating applications.
2. Description of the Related Art
Parallel applications running on distributed memory platforms, such as clusters of servers, typically divide an application state among compute nodes, and execute the nodes in alternate rounds of computation and communication. In these configurations, each node performs roughly the same computation, and the application state is partitioned among nodes in a way that balances the computation load and minimizes inter-node communication. For some problem domains, a static partitioning achieves these goals.
However, in many situations, using a static partitioning is a poor choice, as significant fractions of the computational load can shift between cluster nodes. These behaviors are very difficult to predict before the computation starts and the typical solution is to address such problems at run-time, by monitoring and rebalancing the load between nodes periodically.
To implement this solution, each parallel application is designed to monitor resource usage, and decide when rebalancing is beneficial. Then, the application computes the new partitioning and exchanges portions of the application state with other nodes. Although this solution could yield optimal load balancing and minimum execution times, its implementation requires substantial effort.
The benefits of executing a parallel application are often hard to determine, and hence the effort to architect applications to handle automatic repartitioning and state exchange is not always worthwhile. For these situations, a suboptimal solution which provides load balancing without increasing the complexity of the application is often desired.
For example, online gaming is an application class that could potentially benefit from automated load balancing and repartitioning as long as adding such functionality is not overly burdensome for the developer.
In most online games, a central server maintains the global state of the game world and periodically distributes updates to the clients as the game play proceeds. Players (i.e., clients) communicate their own local state to the server, either periodically or whenever it changes. Popular titles, with a large number of simultaneous clients (e.g., massively multiplayer games), cannot be hosted by a single central server. In these situations, the server-side game application is designed to run on multiple servers.
The ability to repartition the game world dynamically, i.e., to expand to more servers, or shrink to fewer servers, is particularly useful for a number of reasons. For example, the popularity of a title is hard to predict, the number of simultaneous players varies widely during the day, or servers can become overloaded due to player movements into a part of the game world that is hosted locally on a particular server.
Although beneficial from a resource utilization perspective, dynamic world partitioning adds significant complexity to the design and implementation of online games, therefore increasing their time-to-market and development costs. Other applications that are designed to run on multiple nodes and communicate their state as necessary would also stand to benefit from dynamically partitioning their state among nodes. However, dynamically partitioning a game world, a large matrix or a weather map is a difficult task which requires a substantial software engineering and testing effort.
Virtual machines may be used for dynamically partitioning a parallel application running on multiple servers. A virtual machine is a software abstraction that is designed to look and to act like a computer system's hardware. A modern computer system is composed of layers, beginning with the hardware and including layers of an operating system and application programs running on top of the operating system. Virtualization software abstracts virtual machines by interposing a software layer at various places in a system.
Examples of virtualization layers include hardware level virtualization layers, operating system level virtualization layers and high-level language virtual machines. In the case of a hardware level layer, the virtualization layer is positioned directly on top of the hardware of the computer system and acts as hardware for the system. All of the software written for the actual hardware will run on the virtual machine. Operating system virtualization layers are positioned between the operating system and the application programs that are run on the operating system. The virtual machine, in this case, runs applications that are written for the operating system. In the case of high-level language virtual machines, the layer is positioned as an application program on top of the operating system.
Virtual machines provide several attributes that make them attractive for use in parallel applications. Virtual machines provide a compatible abstraction so that all software written for the computer system will run on the virtual machine. Also, the virtual machine abstraction can isolate the software running in the virtual machine from other virtual machines and real machines. Finally, the benefits of using a virtual machine far outweigh the overhead created by adding a layer of software to the computer system.
Commercial virtual machine technology, such as VMWare, provides a limited ability to migrate virtual machines. Typically, only the migration mechanisms are provided; the policies governing virtual machine migration are designed and implemented separately, in the data center management software. Previous work related to virtual machine migration is motivated by the desire to reduce the number of physical servers used in the data center. As a result, related applications, which were previously run on an individual server, are now assigned to a virtual machine. In these configurations, there is little communication between virtual machines and these applications were developed separately from the development of virtual machine migration mechanisms and policies. In contrast, in this invention, using virtual machine migration is considered during the design and development stages of the parallel applications considered as a tool for reducing the software engineering effort and, as a result, the financial risks associated with developing, testing and deploying these applications.