1. Field of the Invention
The present invention relates to an improved data processing system and, in particular, to a method and system for multiple computer or process coordinating. Still more particularly, the present invention provides a method and system for network management.
2. Description of Related Art
Technology expenditures have become a significant portion of operating costs for most enterprises, and businesses are constantly seeking ways to reduce information technology (IT) costs. This has given rise to an increasing number of outsourcing service providers, each promising, often contractually, to deliver reliable service while offloading the costly burdens of staffing, procuring, and maintaining an IT organization. While most service providers started as network pipe providers, they are moving into server outsourcing, application hosting, and desktop management. For those enterprises that do not outsource, they are demanding more accountability from their IT organizations as well as demanding that IT is integrated into their business goals. In both cases, “service level agreements” have been employed to contractually guarantee service delivery between an IT organization and its customers. As a result, IT teams now require management solutions that focus on and support “business processes” and “service delivery” rather than just disk space monitoring and network pings.
IT solutions now require end-to-end management that includes network connectivity, server maintenance, and application management in order to succeed. The focus of IT organizations has turned to ensuring overall service delivery and not just the “towers” of network, server, desktop, and application. Management systems must fulfill two broad goals: a flexible approach that allows rapid deployment and configuration of new services for the customer; and an ability to support rapid delivery of the management tools themselves. A successful management solution fits into a heterogeneous environment, provides openness with which it can knit together management tools and other types of applications, and a consistent approach to managing all of the IT assets.
With all of these requirements, a successful management approach will also require attention to the needs of the staff within the IT organization to accomplish these goals: the ability of an IT team to deploy an appropriate set of management tasks to match the delegated responsibilities of the IT staff; the ability of an IT team to navigate the relationships and effects of all of their technology assets, including networks, middleware, and applications; the ability of an IT team to define their roles and responsibilities consistently and securely across the various management tasks; the ability of an IT team to define groups of customers and their services consistently across the various management tasks; and the ability of an IT team to address, partition, and reach consistently the managed devices.
Many service providers have stated the need to be able to scale their capabilities to manage millions of devices. When one considers the number of customers in a home consumer network as well as pervasive devices, such as smart mobile phones, these numbers are quickly realized. Significant bottlenecks appear when typical IT solutions attempt to support more than several thousand devices.
Given such network spaces, a management system must be very resistant to failure so that service attributes, such as response time, uptime, and throughput, are delivered in accordance with guarantees in a service level agreement. In addition, a service provider may attempt to support as many customers as possible within a single network management system. The service provider's profit margins may materialize from the ability to bill the usage of a common network management system to multiple customers.
On the other hand, the service provider must be able to support contractual agreements on an individual basis. Service attributes, such as response time, uptime, and throughput, must be determinable for each customer. In order to do so, a network management system must provide a suite of network management tools that is able to perform device monitoring and discovery for each customer's network while integrating these abilities across a shared network backbone to gather the network management information into the service provider's distributed data processing system. By providing network management for each customer within an integrated system, a robust management system can enable a service provider to enter into quality-of-service (QOS) agreements with customers.
Hence, there is a direct relationship between the ability of a management system to provide network monitoring and discovery functionality and the ability of a service provider using the management system to serve multiple customers using a single management system. Preferably, the management system can replicate services, detect faults within a service, restart services, and reassign work to a replicated service. By implementing a common set of interfaces across all of their services, each service developer gains the benefits of system robustness. A well-designed, component-oriented, highly distributed system can easily accept a variety of services on a common infrastructure with built-in fault-tolerance and levels of service.
Distributed data processing systems with thousands of nodes are known in the prior art. The nodes can be geographically dispersed, and the overall computing environment can be managed in a distributed manner. The managed environment can be logically separated into a series of loosely connected managed regions, each with its management server for managing local resources. The management servers coordinate activities across the enterprise and permit remote site management and operation. Local resources within one region can be exported for the use of other regions.
A service provider's management system should have an infrastructure that can accurately measure and report the level of consumption of resources at any given resource throughout the system, which can be quite difficult to accomplish in a large, highly distributed computing environment. In order to fulfill quality-of-service guarantees within a network management system consisting of a million devices or more, performance measurements may be required along various network routes throughout the system. Computational resources throughout the system should be controllable so that the management system can obtain accurate resource consumption measurements along particular routes.
Moreover, if a service provider were able to restrict the consumption of resources from a technical perspective, then the service provider could restrict resource consumption of resources for broader business purposes. The service provider could contract with customers to provide a high level of service, thereby requiring the service provider to limit consumption of resources by customers who have not purchased a high level of service.
In order to either restrict or allocate bandwidth intelligently, a service provider must accumulate metrics relating a consumer of bandwidth and a description of the operations performed by the consumer that led to the bandwidth consumption. In some cases, an enterprise leases an entire communication link for a period of time, such as an entire fiber optic channel at specific time period; with a dedicated network, an enterprise can determine for itself how it has consumed network-related resources. In most other cases, though, a communication link is shared, and the service provider needs to know not only which customer has consumed bandwidth but must have an accurate report of the actual bandwidth consumed so as to determine whether it is meeting quality-of-service guarantees for its customers.
In prior art metrics for determining consumption of bandwidth resources, the focus has generally been at a hardware or firmware level in which bandwidth can be measured at a specific time at a specific node or device either at a specific data rate, such as bits per second, or at a specific packet size, such as bits per packet. The type of metrics that a service provider is able to acquire generally leads the service provider to base its sales model on those metrics. Hence, a service provider might charge consumers for a guaranteed data rate or a flat fee for a certain amount of data. In the prior art, though, the service provider's accumulated metrics are tied to the underlying physical structure. The service provider might be able to report that a certain amount of data passed through a specific network node, device, or port over a specific period, but the service provider cannot tell what application originated or consumed the bandwidth. The service provider might report such statistics to its customers, but the customers would be responsible for cross-referencing the report with its own records to determine why the bandwidth was consumed in the manner that was reported. In other words, the service provider cannot determine and control bandwidth at the application level.
In addition, the prior art does not allow service providers to determine and control bandwidth above the application level at the user level. Currently, some service providers, such as Internet service providers, provide service directly to users. In general, though, these types of service providers are closer to the previously mentioned network-pipe providers; there is a one-to-one correspondence between a user that is receiving service and a network connection to the service provider's facilities. When a single user is connected, the service provider can monitor and control the network connection to determine the amount and characteristics of the data flow to the user. However, if the user configures a private local area network with multiple devices and users using the single network connection to the service provider, such as a home owner with multiple devices connected to a home hub/router, the service provider only observes an increase in data traffic but cannot distinguish data traffic to or from the different devices, cannot control other application-related operations on the devices, and cannot distinguish actions related to various users of those devices.
In order to maintain quality-of-service guarantees, the service provider requires detailed bandwidth data. Moreover, if the service provider desires to be able to offer and charge for services at much finer granularities than raw bandwidth over certain periods of time, then the service provider must be able to control user-level and application-level operations.
Therefore, it would be advantageous to provide a method and system that measures consumption of bandwidth at the application-level and user-level. It would be particularly advantageous if the management system within a service provider's network could identify bandwidth consumption at fine granularities, thereby requiring finer bandwidth usage statistics, such as bits-per-user, packets-per-user, bits-per-application, or packets per application.