Peer-to-peer (P2P) communication systems allow computational entities (peers) to establish software connections (virtual channels) between one another. P2P systems therefore allow peers to communicate or share computational tasks and resources without the explicit need for centralized control. P2P can operate in a generalized network having one or more servers: a peer may provide information (publish) to at least one service on the network and/or register (subscribe) with services on that network to receive information published by another peer.
Messaging systems that benefit from the provision of centralized control are also known. Here, all messages are directed from publishers to subscribers, via a central locus where some computation (mediation) is performed on the messages. New messages (digests, for example) are generated from the input messages and sent to appropriate subscribers.
In prior art centralized mediation systems, all message traffic is transmitted through a central network point (locus), where the mediation service resides. Viewed in terms of logical elements, such systems are constructed as a star-shaped architectural model with a central point of control, where mediation tasks are executed. This model is shown in FIG. 1A: each source (publisher) and sink (subscriber) of information has a line of communication that connects to the central mediation hub. In many cases, the sources and sinks represent the same entities operating in different modes, and may not be architecturally distinguishable.
The problems associated with such an architecture are well known. They are prone to suffer from a lack of bandwidth at the point of mediation. Even though the logical star shape may be superimposed upon a physical network that is highly connected, the essential flow of all information through a central point causes an inherent throughput bottleneck, based upon the bandwidth available between this point and the network (see FIG. 1B). Although advances in networking technologies mean that bandwidth availability continues to improve, increasing bandwidth has an inherent financial cost, and in certain scenarios can cause a real limitation to the throughput of the overall system. This limitation is manifested as a restriction on either the maximum number of users, or the rate at which each user is able to send and receive information.
Indeed there are many examples of systems where neither P2P architectures nor centralized mediation architectures are wholly satisfactory. Often some logical process is required to act over the sum of messages broadcast within the messaging system. Examples of classes of systems where neither architecture is completely suitable include: a trading system where potential buyers and sellers advertise to each other, mediation is required to ensure a transactional matching of requirements; a mediated news or publishing system where a central authority acts as the editorial control, before information is disseminated; a system which is not actively controlled but which requires an ordered log of information flow to be maintained in a central repository; a conversation service which allows a recent context to be presented to a user joining an ongoing conversation; distributing cryptographic keys (the so-called key distribution problem); systems for finding the location of data (state) and services on a distributed network; and systems for locating and communicating with mobile users.
All the examples above have in common a requirement both for peer-to-peer communication, and for a degree of centralized mediation of the flow of information when the communication is viewed as a whole.
Applicant's co-pending U.S. patent application Ser. No. 10/903,156, incorporated herein by reference, describes a distributed mediation network that overcomes many of the disadvantages described above. A distributed mediation network of this type overcomes the problems associated with providing a mediation service at a single server by distributing the service among a number of logically discrete entities. In order to do this, a mediation application must be amenable to logical partitioning into discrete mediation application components. This permits the mediation service to be partitioned into a set of mediation segment services distributed across a resource pool of servers, with each server providing the mediation service for one or more of these segment services. Hereinafter, this approach will be termed distributed mediation.
To properly eliminate the bandwidth problems at every mediation point, it must be possible to evenly load balance the mediation service across the available resource pool. In systems that exhibit fluctuating demands over time the load across the pool must be dynamically balanced. As such, it must be possible to dynamically change the way in which the application is partitioned. It is therefore necessary to be able to move a mediation segment service from one server to another. Moreover, the movement of a segment service must preserve externally observed causality. That is, the ordering of the interactions with each segment service must be preserved in the face of changes to how that segment service is implemented. This requirement is vitally important in many systems in which out of order interactions have serious consequences, such as financial systems.
In the distributed mediation network, information is classified by content, and mediation requirements are separately served in different processes according to that content-based classification. As demand varies with time over the classification, the corresponding mediation application components may be physically moved to balance both network and computational loads for the whole system. Such load balancing can satisfy the demands placed upon it up to some threshold governed by the sum of the computational, I/O and memory resources of the available servers offering mediation. Beyond this threshold, the quality of service will degrade as the available resources simply cannot handle the load. In order to address this problem, a distributed mediation network will preferably provide mechanisms for the introduction of additional computer resources to the system. Similarly, when excessive resources are available to a mediated application it is preferably possible to remove deployed computational capacity.
The term ‘autonomic’ has historically been used to refer to the aspect of the nervous system that acts subconsciously to regulate the body, such as the control of breathing rates or the heartbeat. It has recently been used to refer to computer networks that are capable of analogous self-regulation. An autonomic system may be capable of, amongst other things, self-repair, self-configuration, self-monitoring, and self-optimization, all without the need for external input. Indeed, in the autonomic paradigm, any changes that occur autonomically are in fact impossible for the user to detect.
An autonomic computing system consists of autonomic elements. These autonomic elements are logical constructs that monitor some aspect of the system, analyzing its output and taking specific actions to adjust it, so that the overall system is able to meet specific requirements, often expressed as service level agreements (SLAs). SLAs specify the information technology resources needed by a line of business and the specific applications that they maintain.
Autonomic elements are self-organizing and are able to discover each other, operate independently, negotiate or collaborate as required, and organize themselves such that the emergent stratified management of the system as a whole reflects both the bottom up demand for resources and the top down business-directed application of those resources to achieve specific goals.
It is an object of the present invention to provide autonomic functionality to a distributed mediation network.
Throughout this document, only the term physical node refers to physical machines. The terms “node” and “logical node” are used interchangeably to refer to the locus having state properties. In terms of the logical topology of the mediation network, a module provides the functionality of an associated logical node.
Hereinafter, the term “high watermark” is used to indicate a maximum threshold level of traffic that may be handled by a single element or node within the network, while the term “low watermark” indicates the minimum threshold, each element being configured to handle traffic levels in a range between the high watermark and the low watermark.