FIG. 1 depicts a block diagram of one architecture of a job handling system in the prior art. Job handling systems have many applications such as, for example, multiprocessing computing systems, automatic call distribution in telemarketing centers, and routing of Internet Protocol (IP) packets. The fault tolerance and fault recovery capabilities of this architecture are of particular interest to an understanding of the present invention. The architecture and its fault tolerance and fault recovery capabilities will now be described.
Job handling system 100 comprises processor 101, servers 102-1 through server 102-N, wherein N is a positive integer, and queue manager 103, which are connected via the logical links shown.
Processor 101 comprises the hardware and software needed to receive jobs on logical link 110, to queue them for assignment to servers 102-1 through 102-N, when necessary, and to act as the interface between queue manager 103 and servers 102-1 through 102-N. The processes performed by processor 101 are described below in detail and with respect to FIGS. 2 and 3.
Each of servers 102-1 through 102-N is an entity that is capable of processing a job that is assigned to it. As is well known to those skilled in the art, each of servers 102-1 through 102-N is a machine, a person, or a combination of a machine and a person. Each of servers 102-1 through 102-N receives jobs from processor 101 on logical links 111-1 through 111-N, respectively. Furthermore, each of servers 102-1 through 102-N continually notifies processor 101 by transmitting a one bit idle/busy indicator on logical links 121-1 through 121-N, respectively. The idle/busy indicator expresses when the server is busy performing a job, in contrast to when it is not busy and is available to process another job.
Queue manager 103 comprises the hardware and software needed to queue information about the jobs queued in processor 101, to monitor the idle/busy indicator for each server on logical link 114, and to assign jobs to servers when they are not busy. Queue manager 103 also transmits a one-bit status indicator to processor 101 on logical link 115 that indicates whether or not queue manager 103 is operating normally.
When queue manager 103 is operating normally, job handling system 100 is in normal state 201, as represented by state diagram 200 in FIG. 2. In contrast, when queue manager 103 crashes, job handling system 100 transitions into failure state 202. The operation of job handling system 100 in the normal state is different than when it is in the failure state.
When job handling system 100 is in normal state 201, it performs three salient asynchronous processes (as shown in FIG. 3):                i. Job Queuing Process 301-1—the reception, assignment, and possible queuing, of new jobs by processor 101,        ii. Server Status Monitoring Process 302-1—the reception of the idle/busy indicators from servers 102-1 through 102-N at processor 101 and transmission of the indicators from processor 101 to queue manager 103, and        iii. Server Assignment Process 303-1—the assignment of queued jobs to servers by queue manager 103.Each of these will be discussed in turn.        
Queuing Process 301-1 is executed by processor 101 upon entering normal state 201.
At task 311, a job arrives. The job can be anything that can be performed by any of servers 102-1 through 102-N. Each job can be, for example, connection-oriented (e.g., a telephone call, an instant messaging [IM] session request, etc.) or not (e.g., an e-mail, a Hyper Text Transfer Protocol [HTTP] service request, an arbitrary IP packet to be routed, etc.).
At task 312, processor 101 determines whether a server is idle, as indicated by the servers' idle/busy indicators. If so, then control passes to task 313; otherwise, control passes to task 314.
At task 313, processor 101 assigns the job to the idle server, at which time the server changes its idle/busy indicator to “busy” until it finishes the job.
At task 314, processor 101 queues the job in queue 132, and transmits a description of the job to queue manager 103 on logical link 112, which, in response, queues the description of the job in queue 133. Queue manager 103 uses the descriptions in queue 133 to infer what jobs are queued in queue 122 and to assign jobs in accordance with the queue discipline.
Server Status Monitoring Process 302-1 is executed by servers 102-1 through 102-N, processor 101, and queue manager 103 upon entering normal state 201. As shown in FIG. 3, in this process, idle/busy indicators are transmitted from one or more of servers 102-1 through 102-N, received by processor 101, and transmitted from processor 101 to queue manager 103. As will be appreciated by those skilled in the art, although server status monitoring process 302-1 is depicted in FIG. 3 as “busy waiting”, it will be clear to those skilled in the art how to implement process 302-1 more efficiently.
Server Assignment Process 303-1 is executed by queue manager 103 upon entering normal state 201.
At task 331, queue manager 103 selects the next job in queue 133 to be assigned using the information about the jobs in queue 133 and the queue discipline.
At task 332, queue manager 103 determines whether a server is idle, as indicated by the servers' idle/busy indicators. If so, then control passes to task 333; otherwise, control remains in task 332.
At task 333, queue manager 103 instructs processor 101 to assign the job to the idle server via logical link 113, at which time the server changes its idle/busy indicator to “busy” until it finishes the job. As part of task 332, queue manager 103 removes the description of the job from queue 133, and processor 101 removes the job from queue 132.
By performing the three processes in normal state 201, job handling system 100 receives jobs, queues them when necessary, and assigns them to servers in well-known fashion.
When job handling system 100 is in failure state 202, it performs two salient asynchronous processes (as shown in FIG. 4):                i. Job Queuing Process 301-2—the reception and assignment of new jobs by processor 101, and        ii. Server Status Monitoring Process 302-2—the transmission and reception of the idle/busy indicator from servers 102-1 through 102-N to processor 101.Each of these will be discussed in turn.        
Job Queuing Process 301-2 is executed by processor 101 upon entering failure state 202.
At task 411, a job arrives. Task 411 is identical to task 311.
At task 412, processor 101 determines whether a server is idle, as indicated by the servers' idle/busy indicators. If so, then control passes to task 413; otherwise, control passes to task 414. Task 412 is identical to task 312.
At task 413, processor 101 assigns the job to the idle server, at which time the server changes its idle/busy indicator to “busy” until it finishes the job. Task 413 is identical to task 313.
At task 414, processor 101 drops the job because it has no capability for assigning queued jobs. This is in contrast to task 314 in which those jobs that cannot be immediately assigned are queued for later assignment.
Server Status Monitoring Process 302-2 is executed by servers 102-1 through 102-N and processor 101 upon entering failure state 202. As shown in FIG. 4, in this process, idle/busy indicators are transmitted from one or more of servers 102-1 through 102-N and are received by processor 101. As will be appreciated by those skilled in the art, although server status monitoring process 302-2 is depicted in FIG. 4 as “busy waiting”, it will be clear to those skilled in the art how to implement process 302-2 more efficiently.
The salient disadvantage of the first architecture is that jobs that cannot be immediately assigned are dropped, and a dropped job might be valuable and difficult or costly to replace.
FIG. 5 depicts a block diagram of a second architecture of a job handling system in the prior art, which has superior failure state capabilities to the architecture depicted in FIG. 1. The salient characteristic of the second architecture is that it comprises two queue managers—a primary unit and a secondary unit—such that the secondary backs up and fills in for the primary unit while the primary unit is down (i.e., when the system enters failure state 202).
Job handling system 500 comprises processor 501, servers 502-1 through server 502-N, primary queue manager 503-1, and secondary queue manager 503-2, which are connected via the logical links shown.
Processor 501 is similar to processor 101 in job handling system 100, except that it interfaces with two queue managers rather than one, as described below in detail and with respect to FIGS. 2, 6a, 6b, and 7.
Each of servers 502-1 through 502-N are identical to servers 102-1 through 102-N in job handling system 100.
Primary queue manager 503-1 and secondary queue manager are identical to each other and to queue manager 103 in job handling system 100.
When primary queue manager 503-1 is operating normally, as indicated to processor 501 by the status indicator on logical lead 515-1, job handling system 500 is in normal state 201. In contrast, when primary queue manager 503-1 crashes, as indicated to processor 501 by the status indicator on logical lead 515-1, job handling system 500 is in failure state 202.
As is described in detail below and with respect to FIGS. 2, 6a, 6b, and 7, the operation of secondary queue manager 503-2 is identical in both the normal state and the failure state, and it is the sole responsibility of processor 501 to invoke secondary queue manager 503-2 (i.e., make secondary queue manager 503-2 active).
When job handling system 500 is in normal state 201, it performs four salient asynchronous processes (as shown in FIGS. 6a and 6b):                i. Job Queuing Process 601-1—the reception, and possible queuing, of new jobs by processor 501,        ii. Server Status Monitoring Process 602-1—the transmission and reception of the idle/busy indicator from servers 502-1 through 502-N through processor 501 to primary queue manager 503-1 and secondary queue manager 503-2,        iii. Server (Primary) Assignment Process 603—the assignment of queued jobs to servers by primary queue manager 503-1, and        iv. Server (Secondary) Assignment Process 604—the assignment of queued jobs to servers by secondary queue manager 503-2.Each of these will be discussed in turn.        
Job Queuing Process 601-1 is executed by processor 501 upon entering normal state 201, and Job Queuing Process 601-1 is identical to Job Queuing Process 301-1, except that processor 501 transmits a description of the job to primary queue manager 503-1 only.
Server Status Monitoring Process 602-1 is executed by servers 502-1 through 502-N, processor 501, primary queue manager 503-1 and secondary queue manager 503-2 upon entering normal state 201.
Server (Primary) Assignment Process 603 is executed by primary queue manager 503-1 upon entering normal state 201. Server (Primary) Assignment Process 603 is identical to Server Assignment Process 303.
Server (Secondary) Assignment Process 604 is executed by secondary queue manager 503-2 upon entering normal state 201. Server (Secondary) Assignment Process 604 is identical to Server Assignment Process 303. It should be noted that secondary queue manager 503-2 does not actually assign any jobs in normal state 201 because processor 101 does not give secondary queue manager 503-2 any jobs to queue.
By performing the four processes in normal state 201, job handling system 500 receives jobs, queues them when necessary, and assigns them to servers in well-known fashion.
When primary queue manager 503-1 crashes, job handling system 500 enters failure state 202 and performs three salient asynchronous processes (as shown in FIG. 7):                i. Job Queuing Process 601-2—the reception, and possible queuing, of new jobs by processor 501,        ii. Server Status Monitoring Process 602-2, and        iv. Server (Secondary) Assignment Process 604.Each of these will be discussed in turn.        
Job Queuing Process 601-2 is executed by processor 501 upon entering failure state 202, and Job Queuing Process 601-1 is identical to Job Queuing Process 601-2, except that processor 501 transmits a description of the job to secondary queue manager 503-2 only.
Server Status Monitoring Process 602-2 is executed by servers 502-1 through 502-N, processor 501, and secondary queue manager 503-2 upon entering failure state 202.
Server (Secondary) Assignment Process 604 is executed by secondary queue manager 503-2 upon booting up and does not change when job handling system 500 enters failure state 202.
By performing the three processes in failure state 202, job handling system 500 receives jobs, queues them when necessary, and assigns them to servers in well-known fashion.
The advantage of job handling system 500 over job handling system 100 is that job handling system 500 is fault tolerant in that it continues to function smoothly in the event of the failure of its primary queue manager.
When it is time to transition job handling system 500 back into normal state 202, the accepted industry practice is to wait until there aren't many jobs in queue 532—such as late at night—and then re-boot job handling system 500. This has the disadvantage that any jobs in queue 532 at the time are dropped. Sometimes those jobs are valuable and difficult or costly to replace. In such cases, the need exists for a method for smoothly transitioning job handling system 500 back into normal state 202 without losing jobs queued in queue 532.