1. Technical Field of Example Embodiments of the Invention
This invention relates to communications networks for transmitting data, and in particular networks for conveying data that reports events detected by devices dispersed throughout an environment. Such devices may be used for monitoring natural phenomena such as atmospheric, oceanic or geological phenomena, or animal behaviour, or for monitoring human behaviour such as road traffic, inventory control, or monitoring the activities of vulnerable or untrustworthy people. As the devices are distributed throughout the environment to be monitored, this is known as “pervasive” computing technology.
2. Description of Related Art
Devices to perform such monitoring are readily available, such as sensor networks, RFID Tags and biometric scanners. An RFID (Radio Frequency Identification Device) is a small integrated circuit connected to an antenna, which can respond to an interrogating RF signal with simple identifying information. They are used in a number of situations. In an inventory control application, for example in a retail distribution network (factory, warehouse, distribution network, shop), the conventional “barcode” of each object is replaced with an RFID tag. Because of the wireless interface, reading such tags is much easier and faster than passing each item in turn across a conventional barcode scanner. The possibility to identify objects at a distance with a scanner provides great benefit to logistics and supply chains. The information can be generated and processed by multiple databases belonging to different departments of a company: production, disposal, material, finance, etc. The variables contained in each database are updated as soon as a tag is scanned. Variables are shared such that when data information is updated in one specific database other variables need to be updated in other locations of the network. Servers therefore need to forward event update information as soon as a variable is changed. However, in very large-scale systems congestion may occur when a large number of readers are attempting to send data at the same time.
The deployment of pervasive computing technology allows many new uses for such data, and a future can be envisaged in which such devices collectively generate billions of event notifications per second.
A many-to-many (or “Peer to Peer”) messaging infrastructure is one of the most important foundations for distributed electronic business systems. During recent years event-driven and messaging infrastructures have emerged as a flexible and feasible solution to enable communication between web services, distributed databases and other software applications. Such distributed computing systems will be central to advances in a broad range of critical applications, from industrial applications to homeland security. Examples of applications include Air traffic control, military applications, health-care, and power grid management. Tests have already been made in some of these fields using small-scale scenarios. However, on a large and complex scenario, where factors such as network reliability, load and throughput instability become significant, the maintenance of robust and scalable communications becomes a complex issue. In a situation in which millions of pervasive computing devices and distributed processes need to communicate together, the technology must be able to handle such forms of infrastructure instability
There is therefore a need for a system that provides scalability in a large scenario such that reliable communications can be provided between a large number of participants. The characteristics and the traffic load of these applications require group communication protocols that allow multipoint communications among the different processes, such as IP Multicast. When applied to pervasive computing scenarios, traditional IP Multicast protocols, or overlay protocols built in a pure peer-to-peer manner, suffer several scaling problems. One of the more complex problems is the management of the network states managed at each participant or at each router. Even if more complex routing protocols are exploited such as content-based routing the management of content-based forwarding rules is still a complex issue that affects the scalability of the solution.
Many systems that are inherently unstable, with processors joining and leaving the system in real time. In such systems it is preferable for distributed processors to forward and route messages belonging to other distributed processors, without a centralised server being required. The system even under a large traffic load should be maintained stable and the resources of each node should not be overloaded.
One class of protocol that does not suffer the scalability and reliability problems that have been described above is a diffusion-like protocol that exploits “epidemic” or “gossip” techniques. This class of protocol scales quite well and overcomes the phenomena that cause problems of network reliability and scalability. A protocol exploiting epidemic techniques has a number of important properties: the protocol imposes steady loads on participants, is extremely simple to implement and inexpensive to run, and provides very good scalability properties in terms of membership management.
Such protocols operate in the following manner: Each processor in the system maintains a list of a subset of the full system membership; in practice this list contains the network address of other processors that are nearby. At each time interval (not synchronized across the system) each participant selects one (or more) of the processors in its membership list and sends it a message contained in a buffer store. Upon the receipt of an epidemic message, depending upon the configuration of the protocol, a processor checks if the message has already been received or the message has expired in both of these cases the message is dropped. Otherwise the message is stored in the receiving processor's own buffer for subsequent distribution to other processes.
A message can be reliably distributed if it has traveled a certain number of rounds into the network. If we denote the group size as n and we assume that a processor which has received an event informs F other processors about the same event, given a fan-out value F>O (log n), the number of rounds needed to achieve a high probability (e.g. 99%) that all group members are informed is O (log n). (Note that all logarithmic values in this specification are to base 2 unless stated: O (log n) represents a value of the order of log n). Epidemic algorithms therefore sacrifice a small degree of the reliability of deterministic algorithms in exchange for very good scalability properties. As will be described in more detail later, with reference to FIG. 5, the number of nodes “infected” initially rises exponentially, and then trends asymptotically towards the value “n” as traffic declines as the result of the increasing probability of an individual message being delivered to a node that is already infected.
In order to achieve an acceptably high probability that all members of the group are informed, the amount of information sent on the network must be greater than would be the case using a multicast protocol. This is because the transmission of messages is a one-way process, without any information being available to the transmitting device about whether the other devices need that information. In general, the devices selected will include a number that have no need of that individual message, or have already received it. As can be seen from FIG. 5, the number of such messages is typically many times the number of nodes in the system. For such a protocol to be useful, it is therefore desirable to minimise the number of actual messages sent.
International Patent application WO 01/99348 discloses a method in which a processor combines several individual messages that it has received into a single combined message before onward transmission. It proposes an aggregation process in which the identities of multiple events are incorporated in a single index message. In that patent application the index message is shared and understood by the entire (interested) community. The aggregation of events into indexes is controlled outside of the distribution network in order to optimise the delivery of events to interested receivers.