Multiprocessing systems have been known for some time. Various types of multiprocessing systems exist, including parallel processing systems and a variety of forms of computing systems designed for on-line transaction processing.
On-line transaction processing is generally contrasted with batch processing and real time processing. Batch processing involves queueing up a plurality of jobs with each job serially begun after completion of the prior job and completed prior to beginning the next job, with virtually no interaction with the user during processing. If access to a data base was required, the data base was loaded and unloaded with the job. The elapsed time between placing the job in the queue and receiving a response could vary widely, but in most instances took more than a few minutes so that a user could not reasonably input the job and wait for a response without doing intervening work. Until the late 1970's most commercial computer system architectures were intended primarily for batch processing. Batch processing systems have found particular application in scientific applications.
Real time processing systems represent a small share of the commercial market, and are used primarily in manufacturing applications where a stimulus or request must be acted on extremely quickly, such as in milliseconds. Typical applications for real time processing systems involve process control for monitoring and controlling highly automated chemical or manufacturing processes.
On-line transaction processing systems, on the other hand, frequently involve large databases and far greater interaction with a plurality of individuals, each typically operating a terminal and each using the system to perform some function, such as updating the database, as part of a larger task and requiring a predictable response within an acceptable time. On-line transaction processing systems typically involve large data bases, large volumes of daily on-line updates, and extensive terminal handling facilities. Frequently in on-line transaction processing systems only the current version of a database will be contained within the system, without paper backup.
Computer system architectures designed specifically for on-line transaction processing were introduced in the late 1970's, although more conventional batch systems are frequently offered in non-batch configurations for use in the on-line transaction processing. Over time, on-line transaction processing has come to impose several requirements on the processing system. Those requirements include substantially continuous availability of the system, expandability (usually in a modular form), data integrity even in the event of a component failure, and ease of use.
The requirements for substantially continuous availability of the system and data integrity, taken together, are generally referred to as "fault tolerance". A commercially acceptable on-line transaction processing system must therefore offer, as one of its attributes, fault tolerance. However, the term fault tolerance may still be the subject of confusion since it can apply to both hardware and software, hardware only, or software only; in addition, fault tolerance can mean tolerance to only one component failure, or to multiple component failures. In the current state of the art, fault tolerance is generally taken to mean the ability to survive the failure of a single hardware component, or "single hardware fault tolerance".
It may be readily appreciated that fault tolerance could not be provided in a single processor system, since failure of the processor would equate to failure of the whole system. As a result, fault tolerant systems involve multiple processors. However, not all fault tolerant systems need be suited to on-line transaction processing.
Fault tolerant multiprocessor systems range from so-called "cold", "warm" and "hot" backup systems to distributed, concurrent on-line transaction processing systems such as described in U.S. Pat. No. 4,228,496. Cold, warm and hot backup systems are used primarily with batch processing systems, and involve having a primary computer performing the desired tasks with a second computer at varying stages of utilization. When the primary computer fails, the system operator performs a varying range of steps and transfers the task formerly performed on the failed primary system onto the substantially idle backup system. This form of fault tolerant design was usually prohibitively expensive, offered little protection against data corruption, and presented generally unacceptable delays for on-line use.
Fault tolerant distributed processing systems have included systems using a lock-stepped redundant hardware approach initially developed for military and aerospace applications and currently marketed, in a somewhat modified form, by Stratus Computer, as well as those using a combination of hardware and software to achieve fault tolerance, such as described in the afore-mentioned '496 patent. Another approach using a combination of hardware and software to achieve fault tolerance was formerly marketed by Synapse Computer, and involved providing a single additional processor as a hot backup for all other processors in the multiprocessor system.
The redundant hardware approach suffers from a number of limitations, including particularly difficulties in maintaining the requisite tightly couple relationship between the various system elements, and limitations in software development and flexibility.
While the system described in U.S. Pat. No. 4,228,496 provided many improvements in the field of distributed fault tolerant computing, that system also suffers from limitations relating to the overhead required for handling of transaction-based operations. With regard to the overhead required for handling transactions, the system described in the '496 patent appears to require continued communications between primary and backup processors to ensure that the status of the transaction at key stages, called checkpoints, is communicated from the primary to the backup processor. This relatively continuous checkpointing imposes an undesirable overhead requirement. Moreover, depending upon the application being run by the system, the overhead requirement can become an extreme burden on the system.
The system described in the '496 patent also suffers from the limitation of requiring applications programs to be compatible with or written for specially developed software. Such specially developed software in many instances requires programmers to learn new programming languages and unnecessarily limits the ease with which applications can be developed for or ported to the system. It has become well recognized that one of the major stumbling blocks to use of more efficient systems for transaction processing has been the cost of rewriting the customer's application programs for use on a fault tolerant transaction processing system, and these costs are greatly magnified when learning of an entirely new language is required.
As a result, there has been a need for a distributed multiprocessing system capable of fault tolerant operation with simplified handling of transaction based operations.
Thus, there has also been a need for a loosely coupled distributed multiprocessing system capable of fault tolerant operation using conventional operating systems.