The invention generally relates to architectures of electronic computers providing speculative execution.
In an electronic computer, a processing unit communicates with electronic memory holding data and program instructions. Typically, the processing unit executes the program instructions to load data from the memory, perform an operation on that data, and store the result of that operation back to the memory. During the operations, the data and result are temporarily held in registers internal to the processor. Registers are limited in number and cannot be practically used for long term storage of data.
The time required to load data from the memory and to store the data to the memory slows the speed of execution of the program. For this reason, it is known to construct cache memories which allow high bandwidth communication of limited amounts of data between the cache and processor. The cache is refreshed from the main memory as needed according to techniques well known in the art.
It is also known in the art to increase the speed of program execution by using multiple processing units to simultaneously execute instructions or to execute instructions in an order differing from normal program order. Computers using this technique are termed instruction level parallel (xe2x80x9cILPxe2x80x9d) processors as may be distinguished from systems in which independent programs may be assigned different processors, for example.
In an ILP processor, the processor fetches multiple instructions in an instruction xe2x80x9cwindowxe2x80x9d and an allocation circuit allocates those instructions to separate processing units. The separate processing units may read data from memory and perform arithmetic or logical operations on that data. A retirement circuit then collects the results generated by the independent processing units and xe2x80x9cretiresxe2x80x9d the instructions executed by those processing units by writing final results to memory. The retirement circuitry resolves mis-speculation, that is, situations where instructions were executed out-of-order but were in fact dependent on an earlier instruction in the execution order and therefore may have produced an erroneous result.
There are two types of dependency between instructions that may cause mis-speculation. xe2x80x9cControl dependencyxe2x80x9d is the dependency exhibited by instructions after conditional branch or jump instructions on how the branch or jump is resolved. Instructions immediately after a branch may correctly execute only if the branch is not taken. xe2x80x9cData dependencyxe2x80x9d is the dependency of later instructions that use data on earlier instructions that create the data. These later data-using instructions may correctly execute only if the earlier instructions creating the data do not change the data or have completed the change of data. A dependency is xe2x80x9cunambiguousxe2x80x9d if it necessarily produces an error when the dependent instruction is executed before the instruction on which it is dependent.
Control and data dependencies limit the ability of an ILP processor to execute normally sequential instructions simultaneously or out-of-order. If the dependency is unambiguous, the results of an out-of-order execution of a dependent instruction must be discarded (xe2x80x9csquashedxe2x80x9d) and executed at later time (for data dependence) or other instructions executed instead (for control dependence). Squashing instructions is time wasting and to some extent defeats the advantages to be gained from parallel processing. In order to avoid the problems associated with squashing, the ILP processor can guess or xe2x80x9cspeculatexe2x80x9d how any dependency will be resolved or whether it is relevant. This speculation need not be perfect, as it will ultimately be assessed by the retirement circuit, but it should be as accurate as possible. U.S. Pat. No. 5,781,752 issued Jul. 14, 1998 and entitled: Table Based Data Speculation Circuit for Parallel Processing Computer, assigned to the same assignee as the present invention and hereby incorporated by reference, discusses a method of determining likely data dependencies between given instructions in a program so that mis-speculation may be avoided.
The present inventors have recognized that the mechanism of detecting data dependency disclosed in the above referenced pending application may be used to implicitly link data-consuming instructions that rely for their data on a common preceding instruction. In this way, time consuming memory transfers may be postponed or eliminated. Knowledge of data dependencies is used to pass data directly between instructions via special registers without waiting for the completion of memory LOADs. The same structure allows the identification of memory STORE operations which provide the data to the data consuming LOADs so that data may be passed directly from these STORE instructions.
Because the method of identifying data dependence is probabilistic, actual memory LOADs must be performed to confirm the predicted flow of data directly between instructions. However, in an ILP processor, data may be speculatively transferred between instructions to be squashed later if the speculation proves wrong according to standard ILP techniques.
Specifically, the invention provides a memory bypass circuit for use in an electronic processor executing data-producing instructions producing data to be stored into a memory and data-consuming instructions loading data from the memory. The memory bypass circuit includes a dependency table linking in xe2x80x9cread-readxe2x80x9d dependencies first data-consuming instructions to second data-consuming instruction according to a probability that both data consuming instructions read the same memory location. A storage register stores the data received by a first data consuming instruction identified to a read-read dependency of the dependency table. A table reviewing circuit, responsive to a second data-consuming instruction, reviews the dependency table and when the second data-consuming instruction is part of a read-read dependency with the first data consuming instruction, reads the data for that read-read dependency from the storage register.
Thus, it is one object of the invention to explicitly link data between two data-consuming instructions so as to provide a direct path of inter-instruction communication that bypasses standard memory and even cache memory. By establishing a direct inter-instruction communication pathway, the speed of communication is improved.
The dependency table may hold a prediction value associated with each read-read dependency and the table reviewing circuit may respond to the given data-consuming instruction, that is part of a dependency, by reading the data for that read-read dependency from the storage register only if the prediction value is above a predetermined threshold value. A dependency detection circuit detecting a data dependence between data consuming instructions generates the prediction value.
Thus, it is another object of the invention to provide for linkage of data-consuming instructions through the use of a probabilistic technique that examines after-the-fact detections of data dependence to predict future data dependencies. In this way no explicit linkage of the instructions need be done by the programmer and instructions whose dependence varies dynamically may be accommodated.
The dependency table may further link, in xe2x80x9cread-writexe2x80x9d dependencies, particular data-producing instructions to particular data-consuming instructions according to a probability that the data-producing instruction produces data used by the data-consuming instruction. In this case, the storage register further stores the data produced by the data-producing instruction identified to the read-write dependency of the dependency table and the table reviewing circuitry is responsive to a given data-consuming instruction to review the dependency table and when the given data-consuming instruction is part of a read-write dependency, reading the data for that dependency from the storage register.
Thus it is a further object of the invention to permit the same mechanism used for data flow in read-read dependencies to provide for direct data flow between data producing instruction linked by a read-write data dependency to data consuming instructions.
The table reviewing circuitry may be responsive to a given data-consuming instruction to first review the dependency table for read-write dependencies and only second to review the dependency table for read-read dependencies.
Thus it is another object of the invention to capture the benefit of read-read data dependencies when corresponding read-write dependencies are too remote to be detected.
The foregoing and other objects and advantages of the invention will appear from the following description. In this description, references are made to the accompanying drawings which form a part hereof and in which there is shown by way of illustration a preferred embodiment of the invention. Such embodiment does not necessarily represent the full scope of the invention, however, and reference must be made therefore to the claims for interpreting the scope of the invention.