1. Field of the Invention
The present invention pertains to the field of high speed digital data processors and more particularly, to communication between processors in a multiprocessor system.
2. Background Information
Interprocessor communication is an important factor in the design of effective multiprocessor data processing systems for multitasking applications. System processors must be able to execute independent tasks of different jobs as well as related tasks of a single job. To facilitate this, processors of a multiprocessor system must be interconnected in some fashion so as to permit programs to exchange data and synchronize activities.
Synchronization and data transfers between independently executing processors typically are coordinated through the use of controlled access message boxes. A single bit semaphore is used to prevent simultaneous access to the same message box. In operation, a processor tests the state of the semaphore bit. If the semaphore bit is set, the message box is currently "owned" by another processor. The requesting processor must then wait until the semaphore is cleared, at which time it sets the semaphore and can access the message box.
A typical approach to interprocessor communication in prior art machines was to use main memory as the location of the message boxes and their associated semaphore bits. This "loosely coupled" approach minimizes interprocessor communication links at the cost of increasing the overhead for communications. However when the number of processors in a multiprocessing system increases, processors begin to contend for limited resources. For instance, accessing a "global" loop count stored in main memory and used to track iterations of a process executed by a number of different processors is relatively simple when there are only two or three processors. But in a loosely coupled system a processor's access to a global loop count contends with other processors' accesses to data in memory. These contentions delay all memory requests.
A different approach was disclosed in Chen et al U.S. Pat. No. 4,636,942 and in Pribnow U.S. Pat. No. 4,754,398, both of which patents are hereby incorporated herein by reference. The above documents disclose "tightly coupled" communication schemes using dedicated "shared" registers for storing data to be transferred and dedicated semaphores for protection of that data. Shared registers are organized to provide N+1 "clusters" where N equals the number of processors in the system. Clusters are used to restrict access to sets of shared registers. Processors are assigned to a cluster as part of task initialization and can access only those shared registers that reside in their cluster. A semaphore register in each cluster synchronizes access to cluster registers by processors assigned to the same cluster.
Tightly coupled communication schemes reduce communication overhead by separating interprocessor communication from the accesses to memory that occur as part of the processing of a task. However, even in tightly coupled systems, communication overhead increases as a function of the number of processors in a system. This increased overhead directly impacts system performance in multitasking applications. A large number of processors contending for a piece of data (such as a global loop count) can tie up even a dedicated communications path due to increased message traffic. This has been recognized and steps have been proposed to streamline communications in a tightly coupled system.
U.S. Pat. No. 4,754,398 discloses a method for reducing interprocessor communication traffic incurred in executing semaphore operations in a tightly coupled system. A copy of a cluster's global semaphore register is kept in a local semaphore register placed in close proximity to each processor in the cluster. Operations on a cluster's global semaphore register are mirrored in operations on the local semaphore registers associated with that cluster. The use of a local semaphore register reduces the delay between the issuance of a semaphore test command and the determination of the state of that semaphore.
Commonly owned, copending application Ser. No. 07/308,401, now pending, by the present inventor goes a step further by streamlining the local semaphore testing and by replacing the shared real time clock circuit with distributed local real time circuits. That application also extends the tightly coupled design to a system of eight processors. It is hereby incorporated by reference.
In the above system the shared semaphore and information register circuit is partitioned such that one byte of the 64 bit interprocessor communication system is located on each processor board. The bytes are distributed such that the least significant byte of each information register resides on CPU0 and the most significant byte on CPU7. Interprocessor communication commands are a single byte in length; these commands are replicated at the source so as to send the same command byte to each shared circuit in the system.
Global semaphore registers for the above system are distributed among the processors. Since each semaphore register is only 32 bits wide, the least significant byte of each semaphore register is kept on CPU4 and the most significant byte is kept on CPU7.
A local control circuit is placed on each processor board. This circuit receives a interprocessor communication instruction from the processor on the board and determines when to issue the instruction to the shared communication circuitry. In addition, the control circuit knows the cluster that the processor is assigned to and keeps a copy of the semaphore register associated with that cluster in its local semaphore register.
By software convention, a CPU wishing to access a shared information register must gain control of the semaphore associated with that register. First, the CPU issues a Test.sub.-- and.sub.-- Set instruction on the semaphore. If the bit is set, the local circuit halts the CPU until the bit clears and there are no other higher priority interprocessor communication requests. The local circuit then allows issue of the Test.sub.-- and.sub.-- Set instruction and the proper semaphore is set in the shared semaphore register and in each local semaphore register assigned to that cluster.
Once the semaphore bit is set the CPU can access its associated information register by issuing a Shared.sub.-- Register.sub.-- Read or Shared.sub.-- Register.sub.-- Write instruction. Upon completion of the necessary operations on the shared register, the CPU clears the semaphore bit in the shared semaphore register and the proper bit in the local semaphore registers assigned to that cluster are cleared. While the semaphore bit is set no other processor can access the associated information register.
As the number of processors increase, the methods disclosed to date are not adequate to meet the needs of systems having an increased number of processors. The steps required to access and control global variables such as loop counts stored in shared registers adds a significant burden to communications overhead. In the meantime, access to these registers by other processors in the cluster is not permitted. Processors requiring access to the loop count must wait until the semaphore bit is cleared. This has the potential to waste a considerable amount of CPU time.
It is clear that further changes are necessary in the design of a tightly coupled communication circuit to achieve reduced message traffic.