1. Field of the Invention
The present invention generally relates to stored program digital computers and, more particularly, to apparatus for allowing out-of-sequence fetching of operands while preserving the appearance of in-sequence fetching to the processor in a computer architecture requiring in-sequence fetching. The invention is particularly useful in a pipelined multiprocessor system and allows the achievement of higher processor performance for a given processor hardware design.
2. Description of the Prior Art
In certain processor designs, a performance advantage can be derived from performing certain operations out-of-sequence. Often the programmer can not take advantage of such rearrangements of operations due to limitations that exist in the semantics of the instruction set and observable order restrictions that prevent unmonitored rearrangement of operations from yielding correct results.
In tightly coupled multi-processor systems, the possibility of deadlock or erroneous results based on arbitrary out-of-sequence action is well established. The IBM S/370 architecture, for example, requires that operations be done in program sequence or appear to be done in sequence. In particular, operand fetches should appear to be done in sequence and current processors obey this rule by actually doing them in sequence.
It is already standard practice in pipelined processors to check for a certain kind of out-of-sequence condition which can occur when a fetch is made shortly after a store instruction. In this case, it is not that the fetch is made out-of-sequence but, rather, that the completion of the store occurs late in the pipeline. Hence, the result of the store may not yet be reflected in the memory or cache when the subsequent fetch is made. This condition is checked by saving the addresses of started but not yet completed stores and comparing subsequent fetch addresses with these store addresses, a mechanism called OPERAND STORE COMPARE. Fetches that compare are held up until the store is complete.
OPERAND STORE COMPARE applies over a limited time interval, determined by the length of the pipeline. When a store reaches the point in the pipeline where it is completed, the address of that store is removed from the OPERAND STORE COMPARE.
In the prior art, U.S. Pat. No. 4,484,267 to Fletcher describes a hybrid cache where some cache lines are handled as store-in, i.e., main memory update deferred, and the other lines are handled as store-thru, i.e., main memory immediately updated. Hitherto, caches have been either all store-in or all store-thru. How each line is handled is determined dynamically and marked by a flag bit in the cache directory. The bit is reset, for example, by a cross-interrogation hit.
U.S. Pat. No. 4,189,770 to Gannon et al. describes a bypass arrangement whereby, on a cache miss for a variable field length operand, the operand is sent directly to the Instruction Unit without waiting for the entire line to be transferred from the main memory to the cache. Gannon et al. are not concerned with the operation of the cache and teach nothing about out-of-sequence fetches.
U.S. Pat. No. 4,435,759 to Baum et al. describes a monitoring system for capturing and recording events in the operation of a processor. Among the events captured are the addresses of instructions and cache misses.
U.S. Pat. No. 4,400,770 to Chan et al. is concerned with means for detecting and handling cache synonyms. Cache synonyms arise because caches are, for speed and convenience, typically addressed in a way that is partly direct and partly associative. Caches are divided into sets of lines, e.g., four lines per set, and the sets are directly addressed, as is a conventional memory. Selection of the appropriate line within the set is done associatively, that is, by matching the given address with the line addresses stored in the cache directory. In the S/370 architecture, which has address translation, the untranslated low order bits (12 bits) of the given address are used to directly select the set, while some portion of the translated high order bits is used to associatively select the line within the set. However, as the number of sets in the cache is increased to make the cache bigger, there will not be enough untranslated low order bits to select the set, and it will be necessary to use some of the translated high order bits. Now there can be synonyms; i.e., two or more translated addresses that can actually lead to the same real memory location. Such synonyms need to be detected and handled for a number of reasons. Chan et al. teach how to find these synonyms. On a cache miss, Chan et al. provide a means for generating all the possible synonyms, i.e., trying all possible combinations of the translated bits, and checks the cache directory for the presense of any of the synonyms. Any of several actions may be taken on the detection of a synonym.