This invention relates to the field of network analysis, and in particular to traffic monitoring using a capture system that is embodied on a virtual machine.
The ever-continuing increase in computer processing capabilities has resulted in a resurgence of “Virtual Machines” (VM). In a virtual machine system, such as illustrated in FIG. 1, a single physical/actual machine 110 appears to be multiple machines VM1 120, VM2 130, . . . VMn, that are isolated from each other. In like manner, system resources, such as communication interfaces, memory, and the like, may appear to be allocated solely to each of these virtual machines. This conversion of physical components into an apparent plurality of components is performed by a software layer, typically termed a Virtual Machine Interface 140 between the physical components of the actual machine and the plurality of virtual components. The interface 140 is controlled by a Virtual Machine Manager (VMM) 150 that is preferably configured to be as “thin” as possible, imposing minimal overhead burden on the individual virtual machines.
In the field of network analysis, traffic capture elements generally record the time that each communications event occurs, such as the time that each packet is seen by the traffic capture element. By placing traffic capture elements at a variety of locations within a network, performance parameters, such as propagation delays, congestion delays, processing delays, etc. can be determined. The proximity of a traffic capture element to a particular node will affect the analysis that can be performed with respect to that node. If the traffic capture element is remote from the node, the determined performance parameters will be affected by any elements that are between the traffic capture element and the node of interest, and it may be impossible to distinguish the performance attributable to this node. To provide accurate and precise timing parameters, a traffic capture element will often be collocated with each node of interest, preferably embodied on that same node.
Virtual machines are often used for creating a multi-server environment with reduced operational costs, using “server consolidation” techniques. Application servers are often underutilized and idle most of the time, but deployed to accommodate peak demands. By embodying multiple virtual servers on a single actual machine, server consolidation allows the use of a management console to manage the virtual servers with greater ease than if they were running on individual actual machines, and facilitates load-balancing. The smaller number of physical machines also reduces the power, cooling, floor space, etc. of the servers. Additionally, because each virtual machine is isolated from each other, a failure or compromise in one server will not affect the rest of the servers on the machine. Properly configured, a compromised virtual server cannot, for example, effect a denial-of-service attack by consuming an inordinate share of the resources of the actual machine.
Another advantage of a virtual machine embodiment is transportability and independence from the actual machine. In the multi-server application, this allows for the replacement or upgrade of the actual machine with minimal impact on the service being provided, even if the replacement is substantially different from the original actual machine. The virtual machine manager will generally be customized for each type of actual machine (different types of processors, operating systems, etc.) that will host the virtual machine manager, but once this customization is performed, none of the applications that are running on the virtual machines will need to be customized.
In a virtual machine environment, each virtual machine 120, 130 operates independently of each other, and operates as if it were a single, individual machine. Even though the physical machine 110 is being time-shared among the virtual machines 120, 130, the individual virtual machines 120, 130 are unaware of this time-sharing.
Of particular note, each virtual machine 120, 130 is unaware of the gaps in real time as the actual machine 110 services the other virtual machines. As in the actual machine 110, timing is typically accomplished by counting the ‘ticks’ of a system clock, each tick triggering an interrupt that causes the processor to increment a counter. In a virtual machine system, the actual machine 110 receives the interrupts directly, whereas the virtual machine manager 150 buffers these interrupts and provides them to the virtual machines, via the interface 140, during the intervals that each virtual machine is enabled.
Generally, all interrupts received by the actual machine are provided to the virtual machines, albeit shifted in time by the virtual machine manager. That is, the same number of interrupts are provided to each of the virtual machines as to the actual machine. Accordingly, because the individual virtual machines are regularly ‘disconnected’ from the actual machine, the timing interrupts will not occur at the uniform rate that the actual timing interrupts occur.
FIG. 2 illustrates an example timing diagram for propagating clocking signals in a virtual machine system with two virtual machines VM1, VM2. As illustrated in FIG. 2, the actual machine receives timing interrupts 210 at a constant rate. However, the individual machines VM1, VM2 are selectively enabled 221, 222, and during the period that a virtual machine is not enabled, the virtual machine will not be notified of the interrupt. That is, for example, during the period 220 immediately before VM1 is enabled 221, some interrupts 210 will occur. To assure that each of the virtual machines receives all of the interrupts 210, the virtual machine manager includes an interrupt ‘buffer’ that records the occurrence of each interrupt.
When a particular virtual machine is enabled, the timing interrupts that had occurred while this machine was not enabled will be provided to the virtual machine from the virtual machine manager at a faster rate than the rate of actual timing interrupts. Eventually, there will be no interrupts in the buffer, and subsequent interrupts will be provided to the enabled virtual machine as they occur on the actual machine. As illustrated in FIG. 2, when VM1 is enabled, it initially receives interrupts 230 at a rate faster than the rate of interrupts 210 until all of the buffered interrupts are received, then receives interrupts 235 at the rate of the interrupts 210. In this manner, the total number of interrupts provided to the virtual machine VM1 while it is enabled is equal to the total number of interrupts that occurred since it was last enabled. In like manner, when VM2 is enabled 222, it receives interrupts 240 at a faster rate than the actual rate that the interrupts 210 occur, then receives interrupts 245 at the rate that subsequent interrupts 210 occur.
Other techniques may be used to provide timing information to each virtual machine, but in general, any system that hides the fact that gaps are occurring in real time as the actual machine is being time shared among the plurality of virtual machines will introduce an artificial time that differs from the actual time. The introduction of an artificial time to each virtual machine to make it appear that the virtual machine is operating just like an actual machine limits the use of virtual machines for applications that require accurate and precise timing measures.
As noted above, traffic capture elements are often preferably placed at server locations, to monitor the performance of each server and the overall performance of the network with regard to such servers. Ideally, if multiple virtual servers are embodied on an actual machine, the traffic capture element would also be embodied on this same actual machine. Unfortunately, the aforementioned use of artificial timing for each virtual machine makes it infeasible, or impractical, to embody the traffic capture element as a virtual machine on the actual machine.
To obtain accurate timing information, the traffic capture element may be embodied directly on the actual machine, with access to the actual timing system for recording the time of occurrence of communication events. However, such an embodiment will likely affect the overall performance of all of the virtual machines, because it would compete with the virtual machine manager for actual system resources, and would need to have priority over these virtual machines in order to accurately determine the time that the communication event occurs.
The traffic capture element may also be embodied within the virtual machine manager, to more efficiently control this competition. However, the embodiment of a traffic capture element within a virtual machine manager would significantly increase the overhead associated with the virtual machine manager, because a traffic capture element will generally be configured to process each monitored packet to record information that may be required for subsequent traffic analysis.
Embodying the traffic capture element within the actual system, inside or outside the virtual machine manager, will also require customizing the traffic capture element for each type of actual machine that can host the virtual machine manager, thereby losing the aforementioned advantages in reduced development time that could be gained by embodying the traffic capture element on a virtual machine.
It would be advantageous to embody a network capture element on an actual device that is hosting a virtual machine manager without substantially interfering with or burdening the virtual machine manager. It would also be advantageous to embody a network capture element on the actual device without having to customize the network capture element for different types of actual devices.
These advantages, and others, can be realized by embodying the network capture element on a virtual machine while avoiding the timing errors and anomalies associated with virtual machines. A utility function that has minimal impact on the actual device or virtual machine manager is embodied on the actual device, preferably within the virtual machine manager. Both the utility function and the traffic capture element are configured to monitor communication events. To minimize the overhead imposed, the utility function is configured to merely store an identifier of the communication event, and the actual time that the event occurred. The network capture element, on the other hand, performs the more complicated and time consuming tasks of filtering the communications, selectively storing some or all of the data content of the communications, characterizing the data content, and so on. Instead of storing the artificial time that the communication event apparently occurred at the network capture element in the virtual machine, the network capture element uses the identifier of the communication event to retrieve the actual time that the communication event occurred at the utility function on the actual machine.
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention.