In a clustered messaging architecture it is common to have a virtual ‘single’ queue of messages distributed across a cluster of multiple messaging servers. In such an architecture, each server owns a subset of messages for that single queue. Each subset of messages typically exists on a partition of the single queue. It is also possible that only a subset of the clustered servers are running at any one time.
It is important to achieve an optimal transmission and/or consumption of messages from a partitioned queue. The invention will however be described in terms of consumption of messages for ease of explanation only.
If there are multiple applications (or multiple instances of a single application; the description is intended to encompass both variations), all consuming the same type of messages, with each application being capable of consuming all messages on a single partitioned queue, then there should be an even distribution of applications (application instances) across all messaging servers that own a partition of the queue, i.e. at least one application being connected to each of these servers.
There may on the other hand exist multiple applications consuming different types of messages with each application comprising multiple instances. In this case, there should be an even distribution of an application's instances across all the messaging servers that own a partition of the queue, i.e. at least one instance of the application being connected to each of these servers. In this way it is ensured that messages of the type applicable to the application will get consumed from each queue partition. Note, applications can be consuming according to some property of a message or they could be selecting a particular type of message based on the value of some header or pay load content—i.e. via a filter.
Optimal distribution of application instances typically becomes a problem if there are not many application instances. If you have a large number of application instances (orders of magnitudes more than the number of partitions) then their distribution is likely to he reasonably even across the partitions no matter what distribution mechanism is used, for example a simple random choice of partition by the application or a round-robin distribution by a number of distribution points. The fewer instances of the application that are attaching, the less likely it is that using a simple distribution mechanism will result in an even distribution. Particularly where there are only a few application instances (about the same number as there are partitions) then it becomes critical to ensure that their distribution is even (ideally at feast one application instance attached to each partition).
Each application will consume a particular type of message from a single partition and an even distribution therefore means that messages will not be left marooned on a queue partition due to the absence of an application consuming messages of that type.
Correctly coordinating the distribution of clients across multiple servers is not so difficult if all the applications initially go to a single point that can coordinate these applications. Unfortunately the reasons for having multiple servers is to prevent a single point of failure and to balance the work load across them, so having all the applications communicate with a single point is an obvious bottle neck. So when there is not a single point coordinating all the applications, the ability fur any one server to know where all the applications are connected is lost.
A simplistic example is now given in which there are four partitions and four instances of an application attaching;
For a single distribution point using round-robin distribution:    partition 1: app instance 1    partition 2: app instance 2    partition 3: app instance 3    partition 4: app instance 4
When you have two round-robin distribution points and applications 1 and 2 go to the first distribution point and 3 and 4 go to the second for distribution, the outcome might be as follows:    partition 1: app instance 1 and 3    partition 2: app instance 2 and 4    partition 3: none    partition 4: none
If there is a single distribution point allocating connections to partitions there is still the problem that the distribution point must know the intentions of each application, distinguish it from other applications and recognise other instances of the same application. Generally there are many different applications attaching to such servers, therefore being aware of them all becomes a difficult and potentially costly (performance and resource) operation.
So using a similar example where there is no knowledge of individual applications, this time with a second application, again with four instances attaching to the same partitions, where the attach order is application 1, application 2, application 1, application 2 etc.:    partition 1: app 1 instance 1, app 1 instance 3    partition 2: app 2 instance 1, app 2 instance 3    partition 3: app 1 instance 2, app 1 instance 4    partition 4: app 2 instance 2, app 2 instance 4
Here it can be seen that application 3 is only attached to partitions 1 and 3 and application 2 to partitions 2 and 4. So any messages for application 1 going to partitions 2 or 4 will not be processed.
There are a number of existing ways to solve to varying degrees the problem of evenly distributing application instances:
It is known to manually configure each instance of an application to connect to each server. This is very restrictive in that each instance of the application now has a reliance on its configured server being available for it to consume any messages; if also ties the application's configuration tightly to the architecture of the messaging servers. If the architecture changes, the application's configuration also has to change.
Some messaging systems provide a solution whereby any application attempting to connect to a cluster of messaging servers will be directed to one of the servers based on a round-robin distribution by one or more ‘bootstrap’ servers. The downside of this is that the distribution applies to all applications attempting to attach to a server in the duster, with no recognition of multiple instances of the same application. This can result in a fair distribution of ail applications connected to the servers; but, in a worse case scenario, it is possible for all instances of the same application to be connected to the same server, resulting in only a single partition of the queue being consumed from by that application. Since an application may be singularly responsible for a particular type of message, this could prove problematic. Further application instances can detach themselves and the bootstrap servers would not have this knowledge to apply when allocating a new application instance to a queue partition.
It is also known to statically configure a queue partition to only allow a single consuming application to be attached at any one time. If a partitioned queue is configured in such a way, the consuming application can connect to each server in the cluster in turn and attempt to attach to the queue partition local to that server. If they succeed in attaching then they know they are the only consuming application on that partition; if they fail then they attempt to connect to the next server and try again until they find an available partition. This ensures that each partition has at most one instance of the application at any one time. The downside is when there are more instances of the application than available partitions of the queue, some instances of the application will be left without a queue partition from which to consume. It also relies on the fact that there are no other applications consuming from the same queue partitions (i.e. applications that are not instances of the same distributed application). Another downside is that the queue must be statically configured to only allow one consumer per partition.
It is also known to allow an application to specify, at runtime, if they require ‘exclusive access’ to a queue partition or not. This is disadvantageous when there are more instances of the application than available partitions of the queue, in this case some instances of the application will be left without a queue partition to consume from. Further, it also relies on the fact that there are no other applications consuming from the same queue partitions (i.e. applications that are not instances of the same distributed application).