1. Field of the Invention
The present invention relates to an improved data processing system and, in particular, to a method and system for multiple computer or process coordinating. Still more particularly, the present invention provides a method and system for network resource management.
2. Description of Related Art
Technology expenditures have become a significant portion of operating costs for most enterprises, and businesses are constantly seeking ways to reduce information technology (IT) costs. This has given rise to an increasing number of outsourcing service providers, each promising, often contractually, to deliver reliable service while offloading the costly burdens of staffing, procuring, and maintaining an IT organization. While most service providers started as network pipe providers, they are moving into server outsourcing, application hosting, and desktop management. For those enterprises that do not outsource, they are demanding more accountability from their IT organizations as well as demanding that IT is integrated into their business goals. In both cases, “service level agreements” have been employed to contractually guarantee service delivery between an IT organization and its customers. As a result, IT teams now require management solutions that focus on and support “business processes” and “service delivery” rather than just disk space monitoring and network pings.
IT solutions now require end-to-end management that includes network connectivity, server maintenance, and application management in order to succeed. The focus of IT organizations has turned to ensuring overall service delivery and not just the “towers” of network, server, desktop, and application. Management systems must fulfill two broad goals: a flexible approach that allows rapid deployment and configuration of new services for the customer; and an ability to support rapid delivery of the management tools themselves. A successful management solution fits into a heterogeneous environment, provides openness with which it can knit together management tools and other types of applications, and a consistent approach to managing all of the IT assets.
With all of these requirements, a successful management approach will also require attention to the needs of the staff within the IT organization to accomplish these goals: the ability of an IT team to deploy an appropriate set of management tasks to match the delegated responsibilities of the IT staff; the ability of an IT team to navigate the relationships and effects of all of their technology assets, including networks, middleware, and applications; the ability of an IT team to define their roles and responsibilities consistently and securely across the various management tasks; the ability of an IT team to define groups of customers and their services consistently across the various management tasks; and the ability of an IT team to address, partition, and reach consistently the managed devices.
Many service providers have stated the need to be able to scale their capabilities to manage millions of devices. When one considers the number of customers in a home consumer network as well as pervasive devices, such as smart mobile phones, these numbers are quickly realized. Significant bottlenecks appear when typical IT solutions attempt to support more than several thousand devices.
Given such network spaces, a management system must be very resistant to failure so that service attributes, such as response time, uptime, and throughput, are delivered in accordance with guarantees in a service level agreement. In addition, a service provider may attempt to support as many customers as possible within a single system. The service provider's profit margins may materialize from the ability to bill the usage of common IT assets to multiple customers.
However, the service provider must be able to support contractual agreements on an individual basis. In order to do so, management systems must be able to support granularity on a shared backbone of equipment and services as well as a set of measurements that apply very directly with each customer. By providing this type of granularity, a robust management system can enable a service provider to enter into quality-of-service (QOS) agreements with its customers.
Hence, there is a direct relationship between the ability of a management system to provide certain fault-tolerant functionality and the ability of a service provider using the management system to guarantee different levels of service. Preferably, the management system can replicate services, detect faults within a service, restart services, and reassign work to a replicated service. By implementing a common set of interfaces across all of their services, each service developer gains the benefits of system robustness. A well-designed, component-oriented, highly distributed system can easily accept a variety of services on a common infrastructure with built-in fault-tolerance and levels of service.
Distributed data processing systems with thousands of nodes are known in the prior art. The nodes can be geographically dispersed, and the overall computing environment can be managed in a distributed manner. The managed environment can be logically separated into a series of loosely connected managed regions in which each region has its own management server for managing local resources. The management servers coordinate activities across the enterprise and permit remote site management and operation. Local resources within one region can be exported for the use of other regions in a variety of manners.
Managed regions within a highly distributed network may attempt to incorporate fault-tolerance with firewalls that attempt to limit any damage that might be caused by harmful entities. A firewall can prevent certain types of network traffic from reaching devices that reside on the internal protected network. For example, the firewall can examine the frame types or other information of the received data packets to stop certain types of information that has been previously determined to be harmful, such as virus probes, broadcast data, pings, etc. As an additional example, entities that are outside of the internal network and lack the proper authorization may attempt to discover, through various methods, the topology of the internal network and the types of resources that are available on the internal network in order to plan electronic attacks on the network. Firewalls can prevent these types of discovery practices.
While firewalls may prevent certain entities from obtaining information from the protected internal network, firewalls may also present a barrier to the operation of legitimate, useful processes. In order to ensure a predetermined level of service, benevolent processes may need to operate on both the external network and the protected internal network. For example, a customer system is more efficiently managed if the management software can dynamically detect and dynamically configure hardware resources as they are installed, rebooted, etc. Various types of discovery processes, status polling, status gathering, etc., may be used to get information about the customer's large, dynamic, distributed processing system. This information is then used to ensure that QOS guarantees are being fulfilled. However, firewalls might block these system processes, especially discovery processes.
In order to provide more system functionality such that firewalls do not block benevolent data traffic, systems can be built and/or configured in a variety of ways so that secure communication can still be accomplished. A system may comprise static, dedicated pieces of code that operate by using dedicated ports. Each software component communicates with another component by knowing the dedicated port number of the other component. However, memory and other system constraints would eventually limit the number and management of dedicated ports, and the dynamic reconfiguration of port numbers can be quite difficult.
In order to fulfill QOS guarantees, a management system needs to provide an infrastructure such that resources are fairly distributed. A requesting application can request and obtain sole control of a target resource, execute a session with another software component that has responsibility for the desired target resource, and then release the target resource. However, the system management software then has the difficulty of assuring that requesting components receive equitable treatment in the sharing of resources, which can be quite difficult to accomplish in a large, distributed computing environment consisting of hundreds of thousands of devices.
The distributed computing system can be implemented as a closed system that is relatively assured of being free from mischievous network-related attacks. The target resource can then remain in an “open” state available for all requesters on a first-come, first-serve basis, with some type of “honor system” assumed to be followed by devices within the closed system. Many real-time operating systems or embedded real-time processor controllers assume that they operate within this type of closed environment in order to guarantee certain quality-of-service objectives. With the move to more open networks that are interconnected in some manner with the Internet, however, it is becoming increasingly difficult and less desirable for an enterprise to pursue such networks. With a system comprising hundreds of thousands of devices, it is unrealistic to assume that the system can remain in a protected, “closed” states.
Meeting QOS objectives in a highly distributed system can be quite difficult. Various resources throughout the distributed system can fail, and the failure of one resource might impact the availability of another resource. In a highly distributed system, the workload across the entire system may be fairly predictable, but workloads change in a very dynamic manner, and network bandwidth and network traffic can be unpredictable.
Therefore, it would be particularly advantageous to provide a method and system that provides access to target resources in a fair yet highly distributed manner. It would be particularly advantageous for the target resources to be dynamically discoverable and flexibly addressable and utilizable.