It is well known that multiple processors may be coupled together by a computer network to form a multi-processor computer system which operates more effectively than the individual uncoupled processors. In a system of this type, data and program code may be transferred over the network from one processor to another to make effective use of a shared resource, such as a database, or to distribute and balance a workload among all of the processors.
Systems of this type, for the most part, are currently limited by the bandwidth of the interconnecting network. That is to say, the desired message traffic generally exceeds the capacity of the communications network This limitation manifests itself as processing delays for the tasks running on the multiprocessor system. For example, in normal operating conditions, one processor may be waiting for a message from another processor while the other processor is waiting for the network to be able to send the message. Conditions of this type may significantly reduce the apparent performance of a computer system.
A computer system which includes multiple transaction processors that may perform operations on a single shared database is especially susceptible to delays of this kind To prevent data corruption resulting from multiple simultaneous access to a single data item or set of data items, systems of this type generally include a concurrency control mechanism which may make frequent use of the interconnecting network.
In a typical concurrency mechanism, the separate records in the common database are each assigned a respectively different lock entity Access to the data in a record is controlled through its corresponding lock entity. In order to read or write data in a particular record, a transaction executing on one of the transaction processors first "procures" the lock and then changes its state to indicate its type of access. When the transaction is complete, the lock entity is "released". While the lock is procured by one task, its changed state prevents other processors from attempting to procure it and thus from attempting to access the protected record.
The "procuring" and "releasing" of lock entities occurs according to a fixed protocol implemented by lock management software which is accessible to all of the coupled transaction processors. While this lock-management software generally operates efficiently, it may, in some instances produce excessive processing delays of the type described above.
Delays of this type occur when one processor is waiting to procure a lock that another processor is ready to release but network traffic unduly delays either the request for the lock entity from the procuring processor or the notification from the releasing processor that the lock is available. These delays may hold up other requests for the lock entity, creating more processing delays.
Much of the waiting delay in interprocessor communication of this type is eliminated when optical fiber networks of relatively high bandwidth are used to connect the transaction processors. Communications networks of this type are expected to have data transmission speeds on the order of 1 Gigabits per second (Gbps). Since the network propagation delay does not change significantly, this additional bandwidth increases the amount of data that may be in transit on the network at any given time. Thus, a network of this type may substantially eliminate any waiting period for access to the network.
However, even with an interconnection network having a high transmission speed, there may be delays caused by the interprocessor communications protocol. Delays of this type may occur, for example, when only one processor may perform a function, such as lock management or transaction allocation, that is used by all of the processors on the network. Since only one processor is performing this function, requests from the other processors may be delayed while earlier requests are processed.
The most common solution to problems of this type is to divide the function among all of the processors. One way in which this has been done is to partition the data structure used by the function among the different processors, allowing each processor to execute the function on its portion of the data. In a lock management system, for example, the lock entities which are used to control access to the various records may be divided among the processors.
Another way to solve the bottleneck problem is to establish a memory area, containing the data structure, which is shared by all of the processors. In this instance, the function may be performed by any of the processors, which directly access the common memory area only when necessary. Of course, in a system of this type, some sort of concurrency control would be used to prevent data corruption in the shared memory area. This concurrency mechanism on top of the locking concurrency mechanism may exacerbate the processing delays.
U.S. Pat. No. 4,399,504 to Obermark et al. relates to a lock management protocol in which a data structure (lock table) that includes multiple lock entities is passed from one processor to another and copied into the local memory of each processor when it is received. During the time interval between receiving and transmitting the lock table, programs running on the processor may procure available lock entities and release any currently held lock entities.
U.S. Pat. No. 4,412,285 to Neches et al. relates to a tree-structured data communications network in which a semaphore facility is used to effect communication among the processors connected by the network. Concurrency control (locking) is local to each processor for data that is under exclusive control of the processor.
U.S. Pat. No. 4,480,304 to Carr et al. relates to a recovery scheme for a distributed locking system that preserves the integrity of a database controlled by the locking system across system failures.
U.S. Pat. No. 4,692,918 to Elliot et al. relates to a reliable message transmission scheme between multiple interconnected processors. The processors are connected by two local area networks. If one fails, the other is used as a backup. This patent also refers to the use of broadcast media for connecting processors.
U.S. Pat. No. 4,716,528 to Crus et al. relates to a hierarchical locking system. This system assumes that a transaction uses a block of records. At the lowest level, the transaction utilizes a separate lock for each record. However, if the number of records in a block of records accessed by a process exceeds a threshold, all of the individual record locks are replaced by one lock for the entire block. Thus, the process may access any record in the block by procuring the one lock.
U.S. Pat. No. 4,656,666 to Piekenbrock discloses a method of utilizing a loop of electromagnetic energy between the Earth and an object in space as a memory resource. In this patent, data is loaded into and retrieved from the loop of electromagnetic energy. However, the access time may be relatively long since it is the delay between the earth and satellite that provides the memory resource In addition, corruption of the data due to electromagnetic interference is addressed only in passing.