Network links that carry traffic through a network between Virtual Machines (VMs) can, at times, become underutilized. Although link underutilization may be caused by various factors, it typically stems from a mismatch between the layout of the physical network and a traffic load that a network offers to support. To address underutilization, data center operators typically engineer a traffic matrix to better use underutilized links; however, engineering traffic through the network according to conventional wisdom generally requires changing the network topology. This is a consequence of the fact that in most networks the sources of load are geographically “pinned.” Therefore the network needs to be responsive to mismatches in load.
For example, Multiprotocol Label Switching (MPLS) is a commonly deployed, but complex technology, by which networks use short labels instead of long network addresses to direct data between nodes. MPLS permits reactive traffic engineering via the creation of virtual links (which artificially drive up the mesh density in the network), and advertising such links into the routing system. This permits the portion of a traffic matrix to be directed around an over utilized link by placing the new virtual link such that the traffic traverses less utilized links.
Many conventional mechanisms for managing the utilization of the network links can complicate the network. Particularly, a network must be engineered such that it supports normal routing behavior, but also to allow a data center operator to override the normal routing behavior when necessary to optimize the network.
With the evolution of cloud computing, data center operators can manipulate the traffic matrix simply by re-arranging the source or sources of one or more loads. Additionally, it is possible that other classes of networks may emerge in the future having a similar capability. This is a consequence of virtualization and the instantiation of software in the VMs. Particularly, one or more VMs can run simultaneously in a computing core in a server, which is often hosted in a datacenter containing a large array of servers connected by a network. This virtualization of computing provides a unique opportunity to remove complexity from the network in that the sources of load in the datacenter are not geographically pinned. Further, the datacenter operator can re-arrange the load sources such that the offered load matches the physical network interconnecting the server array Such re-arrangement of the load source negates the need to alter the network topology, and allows the data center operators to exploit a single, simplified network fabric.
More particularly, data center operators can optimize a network by migrating a VM that, given its location is currently communicating with its peers over one or more congested links, to another VM location in which the communication with its peers would be over one or more underutilized links. To accomplish their goal, the data center operators must first connect the two VM sites together. Once connected, the code and data that constitute the VM “image” is migrated to the replacement VM. This usually requires an iterative process of copying the image of the VM being replaced in “pages” to the replacement VM. However, copying pages of information between VMs can be complicated and time consuming.
Specifically, the VM being replaced must remain “in-service” while the pages of its memory are copied and transferred to the replacement VM. Thus, even while the transfer occurs, the “in-service” VM is typically receiving generating, and storing new data. To ensure that all the data and code are migrated, a function is needed to determine which code and data has yet to be transferred, and which data and code has been changed or altered since being transferred (i.e., “dirty” pages). Once all the pages are written to the replacement VM, the process begins another iteration to copy the “dirty” pages to the replacement VM. This iterative process continues until both VMs are exact copies of one another, or until the data center operator decides that the replacement VM is “close enough.”
Alternative procedures such as “spawn and destroy” exist where a new VM being added to a compute pool is configured with characteristics that are identical to an old VM it is replacing. Once the old VM has completed its tasks, it is administratively taken offline and destroyed. This “spawn and destroy” method can be considered to be a variation of VM migration, albeit with a different set of operational issues.
Indeed, the methods currently used to re-arrange the load sources in a network are not without problems. Aside from the issues noted above, conventional methods require knowing where in the network to place the VMs before a service that will use the VMs is instantiated. Placing the VMs in the network in an intelligent or planned manner can be fairly straightforward; however, as seen above, the migration of the traffic communicated by live VMs is not. Specifically, using conventional methods, the migration of a VM (i.e., moving the traffic from one VM to another) can take a very long time. And, even though techniques exist to minimize any actual service outage, the length of time needed for a migration dictates that the migration of a VM is not a procedure to be taken lightly. Further, designing the network to handle normal routing behavior and to override the normal routing behavior, increases costs. Additionally, the amount of signaling needed by the network to analyze problems increases proportionally with the amount of manipulation of the traffic matrix needed to override normal routing behavior.
Conventional methods of engineering a traffic matrix and migrating VM traffic could be improved upon given a method and apparatus configured to determine and identify the minimum number of VMs that must be migrated to correct or fix network problems and optimize the network.