Business application data processing frequently involves a group of related programs, each of which handles a single, well defined component of the whole application. Often, the programs that make up a business application run in a single operating system environment such as AIX or OS/2 (AIX and OS/2 are trademarks of IBM Corp.) on a single processor. Sometimes they run in multiple, unlike environments, though still on a single processor.
Many businesses go a step further and distribute programs around a data processing network, rather than run them all on one processor. For example, a single application could be distributed between an AIX/6000 environment on a RISC System/6000 processor and an OS/400 environment on an AS/400 processor. (RISC System/6000, OS/400, AS/400 and AIX/6000 are trademarks of IBM Corp).
There are many advantages in this approach, most of which are related to making better use of resources. It is often a good idea to put a program near to the data it is processing, so that network traffic is kept to a minimum. Load balancing--rescheduling and relocating the workload to complete it as efficiently as possible is another sound reason for distributing an application. Moving an application from one large machine to several smaller machines can also sometimes be a valid reason.
When a single application is distributed, whether to unlike environments on a single processor or to different nodes of a network, a way of enabling each one of the group of related programs forming part of the application to communicate with each other is needed. This can be challenging even when all of the parts are from a single supplier, when there are no variations in the operating systems that are used, when programs are written in a single language and when there is a single communications protocol. When not all of these factors apply, the difficulties in establishing communications between the parts of the application become much greater.
Additionally, one program may only be able to execute while another waits to execute. Even though they may take turns to execute, both programs have to be available to take part in any communications. This conflicts with the desire to be able to run related programs independently of each other.
Three ways in which parts of an application may communicate with each other are conversational communication, calling and messaging. Conversational communication is the most mature and widely used, calling is less widespread and messaging is intensively used in specialised areas.
A conversation is a series of synchronous message exchanges between two partners, both of which remain active for the duration of the conversation. The conversation is analogous to a telephone call.
In calling, a program requests another program to be executed or a procedure to be carried out. The request can be to a local system or to a remote system. The called system runs the requested program or procedure and returns the result in a pre-defined format to the calling program. This is a single two-way exchange of information, analogous to a letter with a reply. Because this process is synchronous, the calling program is suspended while the called program or procedure completes.
Messaging is asynchronous and uses the concept of packets of information that are routed to specified destinations. The routed packets contain information about work to be done, or the results of work which has been done. Queuing is a key component of most messaging implementations and allows great flexibility as to when, where, and how work is accomplished. Programs which use messaging are not logically connected as they would be if they were using calling or conversational communication. They are indirectly associated through one or more queues. A message is communicated by one program placing it on a queue from which the other removes it. After processing, the receiving program may generate a message to be returned to the sending program or forwarded to another program. There is no private, dedicated, logical connection to link the programs.
In order to implement messaging, support programs known as queue managers are established at each node of the network to manage queued messages. Cross-network communication sessions are established between queue managers rather than between individual programs. If a link between processors fails, the queue manager recovers from the failure. Programs on the affected processors are not brought to a halt by such an event. In fact, they need not even be aware that it has happened. The data contained within a message can be valuable and protection of this data can be essential. For this reason, messages may not be deleted from queues at one end of a link until they are received at the other end of the link.
Messages may be declared "persistent" (or non-volatile), so that the queue manager can recover them, for example, from disk storage, after a failure of the system on which the queue manager is running. However this carries with it a significant overhead. For messages that are easily regenerated or which have a limited time during which their transmission may be useful, for example, a request from an automated teller machine for a balance of an account, this persistence is unnecessary and so such messages are declared "non-persistent" (or volatile). No actions are taken to recover them after a system failure of the queue manager. In addition, any that are recovered after such a system failure may be discarded by the queue manager.
Conventionally, two queues are maintained, one for persistent and one for non-persistent messages. In normal operation, it is necessary for the queue manager to select messages from each queue for processing. This may result in messages relating to the same program, but stored on two separate queues, not being processed in the order in which they were received, and so cause delay in normal processing of messages. It also requires programs to deal with two queues instead of one for logically related data.
Several queuing systems (e.g., IBM's Queued Telecommunications Access Method (QTAM), Telecommunications Access Method (TCAM), Information Management System (IMS), and Customer Information Control System (CICS)) provide facilities for queued data. (IBM and CICS are trademarks of IBM Corp.). These systems support persistent and non-persistent queues but not a mixture of persistent and non-persistent messages on the same queue.
Teng describes the IBM DATABASE 2 Buffer Manager in "Managing IBM DATABASE 2 buffers to maximize performance" in IBM Systems Journal, Vol 23, number 2, 1984. (DATABASE 2 is a trademark of IBM Corp.). IBM DATABASE 2 supports recoverable and non-recoverable files on disk but does not support mixing non-persistent and persistent elements within the same file on disk.
Messages that are declared persistent and messages that are declared non-persistent could be stored in the same queue. When the system is restarted after a system failure the queue would be scanned and each message checked to see if it were persistent or non-persistent. Those that were marked non-persistent would be discarded and those that were persistent would be replaced in the queue. This method has the disadvantage that the scanning step would take a considerable period of time and so result in a delay in the system being available after a restart.
It is desirable to provide a mechanism that does not require separate queues for persistent and non-persistent messages and does not require scanning of the single queue when the system is restarted with the consequent delay in availability of the system.