Data Center (DC) architecture generally consists of a large number of compute and storage resources that are interconnected through a scalable Layer-2 or Layer-3 infrastructure. In addition to this networking infrastructure running on hardware devices the DC network includes software networking components (v-switches) running on general purpose compute, and dedicated hardware appliances that supply specific network services such as load balancers, ADCs, firewalls, IPS/IDS systems etc. The DC infrastructure can be owned by an Enterprise or by a service provider (referred as Cloud Service Provider or CSP), and shared by a number of tenants. Compute and storage infrastructure are virtualized in order to allow different tenants to share the same resources. Each tenant can dynamically add/remove resources from the global pool to/from its individual service.
Virtualized services as discussed herein generally describe any type of virtualized compute and/or storage resources capable of being provided to a tenant. Moreover, virtualized services also include access to non-virtual appliances or other devices using virtualized compute/storage resources, data center network infrastructure and so on. The various embodiments are adapted to improve event-related processing within the context of data centers, networks and the like.
Within the context of a typical data center arrangement, a tenant entity such as a bank or other entity has provisioned for it a number of virtual machines (VMs) which are accessed via a Wide Area Network (WAN) using Border Gateway Protocol (BGP). At the same time, thousands of other virtual machines may be provisioned for hundreds or thousands of other tenants. The scale associated data center may be enormous. Thousands of virtual machines may be created and/or destroyed each day per tenant demand.
Each of the virtual ports, virtual machines, virtual switches, virtual switch controllers and other objects or entities within the data center (virtual and otherwise) generates event data in response to many different types of conditions.
All of the events produced by an event-sourcing entity are stored for subsequent use, such as for determining root cause problems associated with events or failures of interest. That is, given an event of interest in the past (e.g., a failure of a virtual entity or object of importance to a customer), the events temporally proximate the failure of interest (e.g., +/− some amount of time) are useful in determining a root cause failure of an event of interest in the past.
However, the various events must be viewed within the context of the real and instantiated structure of the data center at the time of the occurrence of the events. Thus, given that objects/entities within the data structure are constantly changing (instantiated, torn down, migrated, failed, restored etc.), current practice is to store periodic snapshots in time (e.g., every 5 minutes) of the data center structure and use these snapshots to try and identify the root cause failure associated with an event of interest.
Thus, to identify the root cause failure associated with an event of interest the snapshot of the data center structure closest in time to an event of interest is normally used to identify the root cause failure associated with the event of interest. In some systems, the two snapshots of the data center structure temporally bracketing the event of interest may be used to identify the root cause failure associated with the event of interest.
Unfortunately, maintaining snapshots of the data center structure is enormously costly in terms of resources and may also be imprecise given the rapid changes inherent in a data center. For example, snapshots every five minutes might be too infrequent, while snapshots every two minutes might be too costly. Generally speaking, these techniques are expensive and scale poorly.