The present invention relates to message transmission.
Publish and Subscribe (pub/sub) is an effective way of disseminating information to multiple users. Pub/Sub applications can help to simplify the task of getting business messages and transactions to a wide, dynamically changing and potentially large audience in a timely manner.
In a pub/sub system, publishers are not concerned with where their messages are going, and subscribers are not interested in where the messages they receive have come from. Instead, a message broker typically assures the integrity of the message source and manages the distribution of a message according to subscriptions registered in the message broker.
With reference to a pub/sub system (100) as shown in FIG. 1, instead of including a specific destination address in each message, a publisher (105) assigns a topic to a message. A message broker (112) residing on a first computer system (110) comprises a matching engine for matching a topic of a published message with a list of subscribers (120) who have subscribed to receive messages that are published to that topic. In response to a match, the message broker (112) sends the published message to the subscriber (120).
Typically, in order to provide high availability in such a messaging system, a pair of computer systems (110 and 115) is used. A second (standby) computer system (115) monitors a “heartbeat” signal from the first computer system (110). If the second computer system (115) fails to detect a “heartbeat” signal from the first computer system (110), this may be due to failure of the message broker (112) or another component residing on the first computer system (110). In response to failing to detect a “heartbeat” signal, the second computer system (115) “takes over” from the first computer system (110). For example, the second computer system (115) takes over an IP address associated with the first computer system (110). The second computer system (115) can also restart any failed components on the first computer system (110) (e.g. the message broker (112)).
Such a high availability configuration has a number of drawbacks.
Take over by the second computer system (115) of the first computer system (110) can cause delays during which processing of messages cannot occur. To many users, this delay constitutes an unacceptable outage.
Furthermore, when a heartbeat signal fails to be detected, it can be uncertain as to whether this is due to a failed component or due to a failure of the heartbeat signal itself.
Thus, if the second computer system (115) takes over from a “healthy” (i.e. not failed) first computer system (110), the second computer system (115) causes a disruption that effectively is an outage, that is, the very problem high availability sets out to avoid. Furthermore, this can also result in inconsistent and competing systems, with loss of information continuity and high contention for common resources.
There is thus a need for an improved mechanism for providing high availability.