This invention relates to message queuing, and more particularly to a fast, reliable message queuing system for both client-server and mobile agent applications.
Message queuing is the most fundamental paradigm for communication between applications on different computer systems due to its inherent flexibility in allowing both synchronous and asynchronous processing. The message queuing middleware infrastructure is a very flexible framework for a number of application domains in both general client-server as well as mobile agent computing arenas, to wit work flow computing, object messaging, transactional messaging and data replication services.
It will be appreciated that in many transactional messaging scenarios data is oftentimes lost during the transmission. This is no more catastrophic than in the banking industry in which banking records transmitted from one location to another can be lost due to server failures, transmission line failures or other artifacts. It is incumbent upon the system managers to be able to quickly locate the fact that an error has occurred and to be able to reconstruct the data from a known point where the data was valid.
Establishing that point at which an error has occurred has in the past been accomplished by systems which scan an entire so-called log file to reconstruct the up-to-date state of the system before the crash. Log files are routinely utilized with their associated time stamps to identify messages and the data they contain. However, the scanning of entire log files to ascertain the up-to-date state can require scanning as many as 1,000 log records.
Not only is the scanning of the overall log record an inefficient way to ascertain where an error occurred and to be able to reconstruct files from that point, systems in the past have required two disk files, one serving as a data file, and the other serving as a log file.
Moreover, the correlation between the log entries and the data files or sectors is complicated by the fact that in the past, sectors were stored in some indiscriminant order, leaving the mapping between the log file and the sectors a somewhat time consuming process.
By way of further background, it will be appreciated that message queuing is used in general to be able to provide a fail-safe storage for data records which are transmitted from one point to another. If, for instance, an error occurs and data is lost at one location, it can be reconstructed at a second location due to the storage inherent in message queuing.
As an example, it is desirable, especially in stock market trades, that any interruption in trading to be minimized to minutes as opposed to hours. On occasion, however, when system servers go down, recovery can take from two to eight hours depending on the number of trades in the system at that time. There is thus a need to minimize down time and expense of locating and reconstructing damaged files.
Note that as used herein, the term queue file refers to the physical storage of messages that are in transmission. Queue files may also be viewed as holding cells for uncompleted operations. Basically, what this means is that if the receiver is not there to receive a given message, the message is held in the queue file and is deliverable at a later time. As a result, the queue files offer reliability in the retention of information that is transmitted.
Moreover, in traditional systems, the recovery data is not provided by the queue file itself. Thus, queue files have not been utilized to identify the state of the file when an error or lost data has occurred, and have thus not been used to reconstruct the data file from data which is previously uncorrupted. In a traditional system, the recovery data is not provided by the queue file itself.
Another example of how message queuing is applied to a real-world application involves how a message queuing infrastructure may support real-time on-line transaction processing using mobile agents. In this example, the customer, for instance, is a bank with geographically dispersed branches. Customer accounts are created and kept at the local branches where the account was opened. For illustrative purposes, this is called the home branch of the account. A copy of each account is also kept at the main office. A read operation on an account can be made from either the local branch or the main office. An update to an account, however, will require that both the home branch copy and the main office copy be updated in a coordinated fashion.
If the update request occurred at the home branch, the local copy must then be updated. This update can trigger an agent which then automatically submits an enqueue request to the queue manager or queue server. This queue manager in turn dequeues the request across a wide area network to another queue manager, which in turn, dequeues the update request to the database server for the mirror office accounts.
A message queue in this example provides asynchronous and reliable processing. Asynchronous processing begins with the agent that is triggered by the database update at one location. The agent submits the update request to the message queue manager in an asynchronous manner, and need not wait around for a response. The message queue manager serves as holding cell for the request so that the requester can continue processing without the need to wait for a response. The message queue manager also provides reliability in this example in that it maintains a copy of the update request in its queue until the recipient of this update request has acknowledged its receipt via a well-known handshaking protocol called the Two Phase Commit protocol, known in the industry as transactional message queuing.
While these types of message queuing systems have operated reliably in the past, they have relied on a data architecture that uses separate queue data and log record files to store the messages that are appended to a message queue. This architecture prevents rapid repair at the time of a serve crash and requires two storage disks, one for data and one for the log records. Moreover, traditional message queuing architectures are generally not optimized for write operations without requiring extra hardware to work efficiently, and are not appropriate for high throughput systems with low message residence times. The separate queue data and log files mentioned above also introduce an extra level of unreliability since there exists two points of potential file corruption and media failure. Additionally, there is usually no means for the message queuing systems administrator to predefine the amount of work needed to do recovery a priori.
Note, the above systems are commercially available as Digital Equipment Corporation""s DECmessageQ, IBM""s MQ Series, and Transarc""s Encina RQS.
In order to solve the above noted problems with traditional message queuing, a message queuing system is provided that saves and stores messages and their state in an efficient single file on a single disk to enable rapid recovery from server failures. The single disk, single file storage system into which messages and their states are stored eliminates writes to three different disks, the data disk, the index structure disk and the log disk. The single disk, single file storage is made possible by clustering all information together in a contiguous space on the same disk. The result is that all writes are contained in one sweeping motion of the write head in which the write head moves only in one direction and only once to find the area where it needs to start writing messages and their states are stored. In order to keep track of the clustered information, a unique Queue Entry Map Table is used which includes control information, message blocks and log records in conjunction with single file disk storage that allows the write head never to have to back-up to traverse saved data when writing new records. The system also permits locating damaged files without the requirement of scanning entire log files.
In order to find the most recent valid data, a control check point interval system is utilized to find the most recent uncorrupted data. Scanning to find the most recent check point interval permits rapid identification of the last queue. Subsequent scanning of log records after the checkpoint establishes the most up-to-date state of all messages. The above system permits data recovery in an order of magnitude less time than previous systems, while at the same time establishing an efficient forward writing mechanism to prevent the need for searching through unordered sectors.
In one embodiment, a circular wrap around buffering system is used in which a modification of a previous sector is made by appending a new record at the last sector to indicate that the state of a file has changed, thus to reuse previous blocks that have been freed and no longer hold valid messages and/or log records.
The present invention thus provides a log-based data architecture for transactional message queuing systems which utilizes a combined on-disk file structure for the message queue data and log records. It is the combined queue data/log record file, in one embodiment, on a single disk, which improves write operation performance and reliability, while at the same time reducing the number of disks used.
As mentioned above, system crash recovery is accelerated through the use of a Queue Entry Map Table which does not require searching though all of the log records to ascertain where the error occurred. The use of the Queue Entry Map Table also permits a priori assigning the number of requirements on a queue data file that results in extensibility and flexibility to system administrators.
Also as mentioned above, the subject system utilizes a circular queue that implies that there is potential wrap around of the queue data file for storage reuse. This requires that a reservation table or free space heap be maintained to ensure that when the queue wraps around, subsequent write operations do not overwrite queue data and/or log records that might still be valid.
In one embodiment, the queue data storage architecture consists of a single flat file that is created when a queue manager is first initialized based on a fixed size for the queue. The initial queue creation is based on the system administrator""s feel for the peak load on the message queuing system, e.g., the maximum number of expected entries in message queue at any given point in time. Each message in the queue data file contains a Message Header and a Message Body. The Message Body, which contains the message content, is stored on disk in subsequent contiguous blocks that follow the message header.
In the above embodiment, the queue data file is partitioned into a predefined number of logical segments or sectors which can be extended at run time. Each segment contains a copy of the Queue Entry Map Table or QEMT for short, which is stored at the beginning of each segment. The QEMT contains control information for the queue entries and log record information stored in the entire queue file. Message headers, message bodies, and log records are stored after the QEMT with potential mixing of message data and log record blocks.
As will be appreciated, the QEMT size depends on some expected maximum number of queue entries defined by the user at queue creation time. Since the log record takes up some deterministic number of bytes, the queue data file will consist of mixed data types of log records, message headers, message bodies, and QEMTs.
When a new segment is reached in the queue data file, a new QEM Table is written to disk at the beginning of the new segment, with the message and log records following the QEM Table. Since the smallest on-disk data type is the log record, a segment in the queue data file is defined to consist of blocks, where one block is the size of the log record. This implementation enhancement simplifies development of search algorithms.
The state of a transactional message queuing system is captured by the control information contained in a QEMT. The QEMT is defined as a static data structure that multiple threads can operate on, rather than each thread maintaining its own copy.
As a result of the log-based data architecture, the subject invention provides a number of improvements over existing transactional message queuing data architectures. It improves on the performance of the write operation over existing message queuing architectures, which makes message queuing systems based on this invention highly appropriate for high throughput systems with low message residence times such as high speed banking applications. The subject system is also applicable to the underlying reliable messaging infrastructure for the transport of agents over unreliable networks and/or networks with different bandwidths.
Moreover, message data and log record write operations always proceed in the forward direction and both can be stored on the same disk file.
This system also improves the reliability of transactional message queuing systems. In this log-based data architecture, there exists a single place where file corruption can occur versus two potential file corruption scenarios with separate queue data and log record files. Reliability is also improved since fewer disk files are used. A combined queue data/log record file adheres to the Atomicity, Consistency, and Isolation properties of the well-known ACID properties. Also, as will be seen, one can utilize existing RAID technology to do transparent duplicate writes.
The subject system allows the resulting message queuing system to support any method of message data access including First In First Out, Last In First Out or priority-based message data access, while at the same time reducing the amount of time needed for recovery from system crashes. Instead of scanning all data in an entire file for log records in traditional approaches, the subject system only requires that one test a few Queue Entry Map Tables first to determine the most recent checkpoint, and then proceed to scan the log records within that segment.
Moreover, the subject system provides extensibility and flexibility to message queuing systems administration since the invention allows the administrator to control how much work they want to do on system recovery by a priori predefining the number of segments on a queue data file, and subsequently the number of checkpoint intervals, again determined a priori. System administrators can thus pay the overhead cost of writing the checkpoints up front to avoid paying the heavier cost of doing extensive log record scans upon recovery. This tradeoff can be adjusted and fine-tuned to suit the application requirements and domains.
The above advantages flow from the use of a pre-allocated on-disk queue buffer containing queue control information, message data, and transactional log records of message operations. The on-disk queue buffer consists of a number of segments or sectors. Each segment consists of the same predefined number of blocks. At the beginning of each segment is the aforementioned Queue Entry Map Table, which contains control information data regarding the state of the individual queue entries, and pointer offsets to where on disk the messages are physically stored. The Queue Entry Map Table serves as a fixed checkpoint interval for the entire message queuing system. Messages and transactional log records of message operations are stored on the blocks in the segment such that message blocks and log record blocks can be intertwined. Moreover, there is no requirement that the log record for a particular message be stored contiguously to the message.
As a feature of the subject invention, a message data write operation always proceeds in a forward manner for the disk head. Additionally, a message is stored contiguously on disk with no need for pointer traversal. Further, a log record write operation always proceeds in a forward manner for the disk head. Log records are written for change of state in a message operation that follows the Two Phase Commit protocol. Therefore, log records can be written for Prepare, Prepared, Commit, Abort, Acknowledge messages from a remote queue manager.
As an another unique feature, the entire queue can be scanned in a single pass. Moreover, on-disk garbage collection is always a linear process. Additionally, there exists a number of Queue Entry Map Tables on the same file, with the unique sequence number of the most recent table being stored on disk on a graceful shutdown of the queue manager.
Importantly, the read operation can follow the First In First Out, Last In First Out, or Priority-based policy such that no special provision is needed to implement any of the three policies.
Moreover, the recovery procedure is accelerated by searching only the Queue Entry Map Tables timestamp. This is because, the most recent Queue Entry Map Table serves as the starting state for the recovery process. Log records following this table are then read sequentially and changes are then made to the in-memory copy of this most recent Queue Entry Map Table to reflect changes made after the last known checkpoint.