Most digital computers incorporate a single central processing unit (CPU) which is responsible for executing instructions in a program to perform some result. These uniprocessor systems generally have a very simple model of processor execution. That is, the CPU executes instructions in a way so that it appears to a programmer that the hardware is executing the instructions one at a time and in the order the instructions appear in the particular program being executed. This is called "program order" execution and this method of executing instructions is in wide use today in personal computers etc.
A single CPU computer executing instructions in program order has inherent instruction throughput limitations. Two of the biggest limitations occur because a CPU can be made to run only as fast as semiconductor technology allows and it is often a particular instruction cannot be executed immediately. To overcome the first limitation, computers have begun to utilize multiple parallel CPUs with each CPU executing a part of a program. The second limitation has been addressed, in part, by allowing a CPU to "suspend" an instruction which cannot be completed immediately and to begin executing a subsequent instruction to save time. The result is that while a CPU will start instructions in program order, the instructions may complete out-of-order. While allowing a CPU to complete instructions out of order is an advantage, this method of executing instructions creates other significant problems.
In order for multiple CPU computer systems to work effectively, the CPUs must be able to communicate back and forth. Commonly, CPUs communicate through the use of shared memory that is accessible to each. One example of how shared memory can be used for communications is a "mailbox" scheme, which will be used here to illustrate the present invention. In multiple CPU computers, often one CPU is performing an operation where the result of the operation will be used by another CPU. One way the CPUs can communicate is by using a "mailbox" with a "flag". That is, a location in memory is designated as the mailbox where a first CPU puts data for a second CPU to use. The second CPU reads a specified memory location to look for a particular bit pattern (the flag) which tells the second CPU that valid data is available in the mailbox for it to use. To understand the present invention, an understanding of how the mailbox process works is useful.
FIGS. 1 and 2 illustrate a mailbox communication technique between two CPUs. FIG. 1 illustrates a section of program code processor 1 will execute 101 with two instructions detailed. The first instruction 103 causes processor 1 to store data in the mailbox location. While the second instruction 105 causes processor 1 to set the mailbox flag. A section of program code processor 2 will execute is shown 107 with four instructions detailed. The first instruction 109 causes processor 2 to load the mail flag and the second instruction 111 causes processor 2 to test the flag to see if it is set (indicating there is valid data in the mailbox). Branch instruction 113 causes processor 2 to loop back and reload the mail flag (instruction 109) if the flag was not set (as determined by instruction 111). If the flag was set, then processor 2 will continue past the branch instruction 113 and execute the next instruction 115 which causes processor 2 to load the data in the mailbox.
An example of the execution of the instructions shown in FIG. 1 by processors 1 and 2 is illustrated in FIG. 2. These are "sequentially-consistent" CPUs which means they execute all instructions in program order and no later operation is performed until all prior operations are complete. Processor 1 stores data in the mailbox (instruction 103) at time T.sub.4 and then stores the mail flag (instruction 105) at time T.sub.5. During the period indicated by time T.sub.0 -T.sub.3, processor 1 would be executing other instructions not illustrated. In this example, processor 2 loads the mail flag (instruction 109) at time T.sub.0 and checks to see if the flag is set (instruction 111) at time T.sub.1. The branch instruction (113) is executed at time T.sub.2 and since the flag has not yet been set by processor 1, the branch instruction causes processor 2 to branch to the load mail flag instruction 109 and reload the mail flag at time T.sub.3. The flag is checked again at time T.sub.4 and since the flag is still not set yet the branch instruction causes processor 2 to branch back again to the load mail flag instruction which is executed at time T.sub.6. The flag is rechecked at time T.sub.7 and since the flag was set by processor 1, the branch instruction executed at time T.sub.8, does not cause processor 2 to branch so processor 2 now loads the data in the mailbox (instruction 115) at time T.sub.9.
In this example, processor 2 is assured of loading the proper mailbox data as processor 2 does not load from the mailbox until processor 1 has stored the mailbox data and set the flag. If the CPUs do not maintain sequential-consistency, that is, a CPU can complete a subsequent instruction before all prior instructions are executed, the mailbox communication technique can fail.
A simple example of the mailbox communication technique failing is illustrated in FIG. 3 showing the instructions detailed in FIG. 1 being executed by CPUs which do not always execute instructions in a sequentially consistent way. Processor 1 performs two operations. The first operation 103 is to store data in the mailbox during time T.sub.4 and the second operation is to store the mail flag 105 during time T.sub.5. Processor 2 loads the mail flag at time T.sub.0 (109) and then checks to see if the mail flag is set 111 during time T.sub.1. However the mail flag will not be stored by processor 1 until T.sub.5 and so the flag test performed by processor 2 will fail and the branch instruction 113 executed at time T.sub.2 will cause processor 2 to loop back to the mail flag load. If the processor cannot immediately load the mail flag, for example it is not in cache but in main memory, the processor suspends execution of the mail flag load. However since processor 2 can perform the subsequent load of the mailbox data 115 it does so (the CPU "speculates" that this instruction will need to be performed later and therefore performs the load to save time). The CPU completes the reload of the mail flag at time T.sub.4 and retests the flag at time T.sub.5. On this second try, processor 2 will find the flag set and therefore the branch instruction executed at time T.sub.6 will not cause processor 2 to branch. Since processor 2 already completed the load of data from the mailbox during T.sub.3, the processor will not redo the load (it assumes its speculation was correct) but will continue on and execute the subsequent instruction 117 in the program. However, processor 2 will have read invalid data in the mailbox. This is an unacceptable result.
To allow CPUs to execute instructions in a non-sequential way while preventing inter-CPU communication failures, special instructions called "barrier" instructions have been used. These instructions force a CPU to finish prior instructions before preceding past the barrier instruction. A computer programmer uses these instructions to insure that areas of program code that need to be executed in program order are executed in program order by a CPU. In this way, a CPU will only be forced to execute instructions in program order in relatively small parts of a program leaving the CPU free to execute the rest of the program instructions with maximum efficiency.
FIGS. 4A and 4B illustrate the use of a barrier instruction to prevent the problem illustrated in FIG. 3 from occurring. FIG. 4A lists the portions of a program each CPU will execute and is the same as discussed before with the addition of a barrier instruction 401 placed before the "load data in the mailbox instruction" 115. Processor 1 performs operations 103 and 105 during times T.sub.4 and T.sub.5 respectively. Processor 2 performs the same operations in T.sub.0 -T.sub.2 as discussed in association with FIG. 3. However a barrier instruction 401 has been added after the branch instruction 113 and before the load data in the mailbox instruction 115. Therefore after processor 2 tests the flag and finds the flag is not set, the barrier instruction prevents processor 2 is prevented from "speculating" and loading the data in the mailbox forcing the processor to complete the flag load and test operation before continuing. Even if the reload of the mail flag was delayed and the processor could have loaded the mailbox data, the barrier instruction forces the processor to complete all pending instructions before continuing. The loading of data from the mailbox is finally accomplished at time T.sub.12 at which time the mailbox data is valid. Using the barrier instruction has prevented processor 2 from loading invalid data from the mailbox.
While the problem with non-sequential execution has been illustrated with respect to "load" operations, the same problem is present with "store" operations. The use of barrier instruction will also prevent problems with non-sequential "store" operations. However, while a programmer may write a new program utilizing "barrier" instructions, the vast majority of program code in use today is old code that does not utilize "barrier" instructions. If one CPU is executing new code (having barrier instructions) and another CPU is executing old code (no ability to insure sequential execution) then the mailbox communication technique will fail. Both CPUs must execute communications instructions in sequential order.
Often CPUs that can execute instructions in a non-sequential way can be forced to execute all instructions in sequential order. So if the CPU executing the "old" code is forced to execute the entire program in sequential order (start and complete instructions sequentially), successful inter-CPU communications will be assured. But since inter-CPU communications is generally an infrequently performed function in a program, forcing the CPU to execute all instructions sequentially extracts a high performance price to assure success of an infrequent task.
What is needed in the industry is a computer that selectively executes load and store instructions in an ordered way when required and without the use of special barrier instructions. In this way, old programs can be effectively utilized while, at the same time, gaining most of the performance advantages obtained by allowing a CPU to complete instructions in a non-sequential way.