This invention deals with a novel process and novel electronic circuits in a processor for significantly reducing the execution time of programs without increasing processor instruction execution rate. A fetch history table (FHT) stores recent branch history of program execution and is used by a processor to direct the path of future execution of the program. The invention enables any valid FHT entry to control the outgating for execution in any sequence or instructions in aligned sectors in an associated row of an instruction cache (AIC) without the conventional branch instruction overhead. This invention utilizes a novel xe2x80x9csector distribution tablexe2x80x9d (SDT) for quickly locating a next-to-be executed aligned segment of instructions in the associated AIC row for outgating to the processor""s execution pipeline under control of novel FHT entries in novel types of FHT sets. The inventive process enables all FHT entries to have complete flexibility in specifying any sequence of the valid sectors in the associated AIC row.
The prior art is the same as cited in the incorporated specification Ser. No. 09/235,474.
The incorporated specification discloses novel circuits and novel processes for using the novel circuits. The novel circuits and processes include and use a fetch history table (FHT) containing novel FHT entries grouped into novel FHT sets for controlling the processor execution of instructions stored in aligned sectors of an Aligned Instruction Cache (AIC). Each row in the AIC includes a plurality of aligned sectors, each storing all, or a part of, a basic block of instructions ending in a branch instruction. Each valid FHT entry specifies a previously-executed sequence of sectors stored in an AIC row associated with the FHT set. The novel form of each valid FHT entry allows the FHT entry to be selected by a prediction vector during an FHT cycle, and to be used to control future re-execution of its represented sequence to avoid conventional branch instruction overhead and time loss previously occurring in the processor execution of branch instructions.
The incorporated specification provides xe2x80x9cAIC cyclesxe2x80x9d. Each xe2x80x9cAIC cyclexe2x80x9d starts with a determination of an AIC hit or miss, and FHT entries are not allowed to control program execution during those AIC cycle which have an AIC miss. If an xe2x80x9cAIC cyclexe2x80x9d starts with an AIC miss, a FHT entry is generated during the xe2x80x9cAIC cyclexe2x80x9d using conventional branch instruction execution. On the other hand, the subject invention provides novel xe2x80x9cFHT cyclesxe2x80x9d and does not use xe2x80x9cAIC cyclesxe2x80x9d. Each xe2x80x9cFHT cyclexe2x80x9d having a FHT hit is used to control program execution, even when an AIC miss occurs within the xe2x80x9cFHT cyclexe2x80x9d.
An AIC miss occurs when no row in the AIC begins with an instruction currently predicted to be executed by the program. Then, one or more variable-length basic blocks of instructions are fetched from the storage hierarchy of the computer system, and all or part of the fetched basic block(s) are stored into fixed-size aligned sectors in the AIC row associated with the currently predicted instruction. The associated AIC row is selected by hashing the address of the currently predicted instruction to generate an AIC index which locates the associated AIC row in the AIC. The fetched blocks are stored in execution order in the left-to-right sequence of the aligned sectors in the associated AIC row. Since all aligned sectors in the AIC have the same size, any sector may store an entire basic block if the block size does not exceed the storage space in the sector. If a basic block exceeding the size of a sector will fill the sector and its remaining part is stored into the next one or more sectors in the same AIC. When a fetched block overflows the remaining sector(s) in the associated AIC row, the block overflow may be stored into one or more sectors in another AIC row selected by hashing the address of the first instruction to be stored in the first sector overflowing into that AIC row. The branch instruction ending the basic block is stored in the last sector of the block, and the sectors storing any prior part(s) of the block do not contain any branch instruction. Thus at any time, any AIC sector may store a branch instruction ending a basic block, and at any other time the same AIC sector may not be storing any branch instruction.
The incorporated specification groups the FHT entries into FHT sets, and each FHT set is associated with a respective AIC row by being located in the FHT at an FHT index directly calculated from the AIC index. Each of the valid FHT entries in any FHT set specifies a different execution sequences of the sectors in the associated AIC row. However in the incorporated specification, each valid FHT entry in each FHT set specifies an execution sequence starting with the first (leftmost) sector in the associated AIC row (which is not done in the subject specification.).
FHT cycles are used by the inventive process to control program execution. Each FHT cycle has either a FHT hit on a valid FHT entry in the associated FHT set, or an FHT miss when no valid FHT entry is found in the associated FHT set. A FHT hit uses the FHT entry having the hit to control outgating to the processor execution pipeline of a sequence of aligned sectors in the associated AIC row, and the outgated sequence may have any sector order as long as the first sector of the sequence is the first sector in the associated AIC row. A FHT miss does not find any FHT entry in the associated FHT set, and temporarily reverts to conventional branch instruction processing for the program during which a FHT entry is generated to represent the instruction sequence using conventional branch instruction processing. An AIC miss causes a FHT miss, but an AIC hit may not prevent a FHT miss.
Each FHT cycle starts with a prediction operation using a xe2x80x9cnext instruction addressxe2x80x9d provided during the immediate prior FHT cycle either: in a hit FHT entry, or in a generated FHT entry provided in response to a FHT miss. The first FHT cycle for a program uses the program""s entry instruction address. The prediction operation uses the xe2x80x9cnext instruction addressxe2x80x9d to provide a xe2x80x9cprediction vectorxe2x80x9d. Bits in the xe2x80x9cprediction vectorxe2x80x9d respectively predict a sequence of xe2x80x9ctakenxe2x80x9d and/or xe2x80x9cnot takenxe2x80x9d states occurring for the branch instructions in the sequence of aligned sectors,predicted for outgating during the current FHT cycle. The prediction vector may be obtained from a recording made of xe2x80x9cmxe2x80x9cnumber of branches states immediately following the last execution of the instruction at the same address as the xe2x80x9cnext instruction addressxe2x80x9d provided for the current FHT cycle.
The xe2x80x9cnext instruction addressxe2x80x9d (used in the current FHT cycle) is hashed to obtain an AIC index, which locates both an associated AIC row and an associated FHT set. The associated FHT set contains either the next hit FHT entry or the next generated FHT entry, depending on whether the current FHT cycle gets an FHT hit or miss. An AIC hit is obtained if the associated AIC row is located at the AIC index hashed from the xe2x80x9cnext instruction addressxe2x80x9d of the current FHT cycle. An AIC miss is obtained if the associated AIC row at the hashed AIC index does not begin with the instruction located at the xe2x80x9cnext instruction addressxe2x80x9d provided for the current FHT cycle.
In response to an AIC miss, the basic blocks of instructions (next needed for execution) are fetched from the computer storage hierarchy starting at the memory address of the xe2x80x9cnext instruction addressxe2x80x9d of the current prediction. The fetched basic blocks are loaded in execution order into the aligned sectors from left-to-right in the associated AIC row.
The hashed AIC index is used to locate and access the associated FHT set. (This use of the AIC index to associate a FHT set to an AIC row causes problems, which are avoided by the subject invention.) A FHT miss occurs when the xe2x80x9cnext memory addressxe2x80x9d field in any FHT entry of the associated FHT set does not match the currently predicted next instruction address. (The currently predicted memory address is currently loaded in the processor""s Instruction Fetch Address Register, IFAR).
An AIC miss also causes a FHT miss, and all FHT entries in the associated FHT set are invalidated. For an AIC hit having an FHT miss, any invalid FHT entry in the associated FHT set may be selected for replacement. If all FHT entries in the FHT set are valid, a LRU (least recently used) FHT entry in the set may be selected for replacement.
The first-generated FHT entry in its associated FHT set is generated in response to an AIC miss while the sectors in the associated AIC row are being loaded with the instructions of fetched basic block(s). This first-generated FHT entry specifies the left-to-right sequence of sectors in the associated AIC row. (Note that the left-to-right sequence of sectors in any AIC row may represent any execution order for basic blocks fetched from anywhere in the storage hierarchy.)
Thus, the first FHT entry in each FHT set is generated in response to both an AIC miss and an FHT miss. However, the second and later FHT entries in any FHT set are each generated in response to an AIC hit and an FHT miss for the current FHT cycle.
Therefore, an FHT hit requires 1) one or more FHT entries in the FHT set to be valid: 2) a match between the xe2x80x9cnext instruction addressxe2x80x9d from the last FHT cycle (which is also called the xe2x80x9cnext IFAR addressxe2x80x9d) and the memory address of the first instruction in the associated AIC row (it is the first instruction in the first (leftmost) sector in the associated AIC row), and 3) a match between a bit-state sequence in the current prediction vector and a sub-field state sequence in a xe2x80x9cbranches outcomesxe2x80x9d field in the hit FHT entry (indicating a sequence of branch taken and/or not taken states).
Although the disclosed embodiment in the incorporated specification requires each valid FHT entry to specify a different execution sequence in its FHT set, nevertheless each of these different sequences is constrained to begin with the same AIC sector, which is the first sector in the associated AIC row.
The subject invention adds new circuits and new processes to those disclosed in the incorporated specification to perform predictive processing without constraints occurring in the incorporated specification.
The subject invention""s circuits and processes enable a greater variation in the sequence patterns of the sector histories executed for the AIC rows than the circuits and processes disclosed in the incorporated specification. This greater variation of sequence histories enables an increase in the average instruction execution rate for a program, even when no change is made in the processor""s instruction execution rate, or in the size of the FHT or AIC. The subject invention operates using novel xe2x80x9cFHT cyclesxe2x80x9d, and does not use the xe2x80x9cAIC cyclesxe2x80x9d disclosed for the invention in the incorporated specification.
A speedup in program execution rate is obtainable by the subject invention due to the greater variation in sequence patterns available to the program execution, caused by an increase in the FHT hit rate and reduction in the FHT miss rate.
The increase in the FHT hit rate increases the percentage of time that a processor spends using fast predictive instruction processing, and reduces the percentage of time that the processor spends using the slower conventional branch instruction processing. Predictive execution is faster because it eliminates the overhead time needed by conventional branch instruction processing in the program, such as determining branch-target instruction addresses and accessing branch target instructions in the computer storage hierarchy.
The fastest predictive execution performed by this invention occurs while its FHT cycles are continuously having FHT hits and AIC hits to provide a steady stream of instructions from the AIC to the processor execution pipeline without any overhead for conventional branch instruction processing.
Each FHT miss stops predictive processing and returns the processor to slower conventional instruction processing during which this invention generates a new FHT entry for defining the instruction execution sequence immediately following the FHT miss. This invention allows any number of FHT entries (theoretically up to the total number of FHT sets in the FHT) to be associated with any AIC row. The subject invention allows the FHT entries in the same FHT set to be associated with different AIC rows. This differs from the incorporated specification""s embodiment in which each FHT entry in the same FHT set is associated with the same AIC row. This difference allows the invention to avoid the constraints in the incorporated specification""s embodiment, in which the number of FHT entries in each FHT set is the maximum number of FHT entries which may be associated with any AIC row.
The order of operations in the process of the subject invention is different from the order of operations in the process of the incorporated specification. In the incorporated specification, the AIC hit/miss determination is made before the FHT hit/miss determination, while in the subject invention the AIC hit/miss determination is made after the FHT hit/miss determination. This change in sequence of operations by this invention is important to obtaining the advantages of the subject invention over the incorporated specification.
The process of this invention may be characterized as performing xe2x80x9cFHT cyclexe2x80x9d iterations. Each FHT cycle starts with a branch prediction provided by a branch prediction unit in the system. Each branch prediction utilizes a xe2x80x9cnext memory addressxe2x80x9d received from the prior FHT cycle iteration. The xe2x80x9cnext IFAR addressxe2x80x9d locates the next instruction which begins the execution of the current FHT cycle and begins the next basic block in the executing program. At the end of each FHT cycle, the xe2x80x9cnext memory addressxe2x80x9d is obtained and provided to the prediction unit for making a prediction used by the next FHT cycle. Each xe2x80x9cnext memory addressxe2x80x9d begins a next basic block in the program execution and is herein called the xe2x80x9cnext IFAR addressxe2x80x9d because it is loaded into the IFAR (instruction fetch address register) of the processor. The branch prediction unit receives the xe2x80x9cnext IFAR addressxe2x80x9d for generating a xe2x80x9cbranches outcomes prediction vectorxe2x80x9d (prediction vector). The prediction unit provides each prediction comprising a xe2x80x9cnext IFAR addressxe2x80x9d and a prediction vector for use by the next FHT cycle for making a FHT hit or FHT miss determination.
The first FHT cycle for a program loads IFAR with the program-entry memory address into the IFAR as the first xe2x80x9cnext IFAR addressxe2x80x9d, which is provided to the prediction unit. The prediction unit uses the first xe2x80x9cnext IFAR addressxe2x80x9d to generate the first xe2x80x9coutcomes prediction vectorxe2x80x9d which is used during the first FHT cycle to determine a FHT hit or FHT miss. At the end of the first FHT cycle, the xe2x80x9cnext IFAR addressxe2x80x9d is provided to the branch prediction unit for making a prediction for the next FHT cycle.
During each FHT cycle, either a FHT hit or FHT miss occurs. An FHT hit causes FHT predictive processing to be used during the FHT cycle, during which a sequence of AIC sectors is outputted from an AIC row and sent to the processor""s instruction execution pipeline, assuming there is a AIC hit. A FHT miss causes the FHT cycle to use conventional instruction processing while generating a new FHT entry to represent the execution sequence conventionally obtained during the FHT cycle.
This invention operates fastest when successive FHT hits and AIC hits are occurring in consecutive FHT cycles, wherein each FHT cycle uses a short primary process to continuously loop.
If a FHT cycle has a FHT hit, an FHT entry provides the xe2x80x9cnext IFAR addressxe2x80x9d for the next FHT cycle. However, if a FHT cycle has a FHT miss, the xe2x80x9cnext IFAR addressxe2x80x9d is provided by conventional branch instruction processing initiated by the FHT miss for executing a sequence of instructions, from which a new FHT entry is generated, and at the end of this FHT cycle a target address of the last instruction in the sequence is provided to the prediction unit as the xe2x80x9cnext IFAR addressxe2x80x9d for the next FHT cycle.
After a FHT miss in a FHT cycle, the generation of a new FHT entry overlaps the instruction processing for the FHT cycle (including instruction fetching from computer memory for an AIC miss, or segment location in a hit AIC row for a AIC hit). The overlapped processing time for generating the new FHT entry should not be substantially longer than the conventional branch instruction processing time without generating the new FHT entry. During FHT misses with AIC hits, it is important that a sequence of segments be found quickly in the selected AIC row regardless of the order of the segments in the sequence. A Segment Distribution Table (SDT) is provided herein to minimize the time needed for locating a sector in an AIC row required by the sequence being determined for a new FHT entry being generated for a FHT miss.
Each outcomes prediction vector contains m number of bits, which respectively represent the branch states of a sequence of m number of branch instructions executed by the program. The first bit in the m bit sequence of each prediction vector represents the taken or not-taken branch state of the branch instruction ending a basic block having its first instruction located by the xe2x80x9cnext IFAR addressxe2x80x9d received from the previous FHT cycle. Each of the m bits in the prediction vector is set to either a zero or one state to indicate either the taken or not taken state for a sequence of m branch instruction consecutively executed after the instruction located by the xe2x80x9cnext IFAR addressxe2x80x9d.
The vector generation process in the branch prediction unit may use a branch-state recording made during a previous execution of the program. The branch-state recording includes an indication of the taken or not taken state previously executed for each branch instruction in the execution sequence of the program For example, each branch instruction representation for a program execution may contain a taken or not taken state indication. The xe2x80x9cnext IFAR addressxe2x80x9d provided by the last FHT cycle may provide a locating index in the branch-state recording to locate a sequence of m basic blocks, (containing the sequence of m number of branch state indications ending m number of basic blocks. These m number of consecutive branch state indications are marked-out in the recording. The current prediction vector is then generated by respectively setting each of xe2x80x9cmxe2x80x9d number of sequential vector bits to either a zero or one state to represent the corresponding branch state indication in the marked out sequence in the recording.
Although there are m prediction bits in each prediction vector provided by the prediction unit, the prediction bits in the vector are used sequentially by the FHT cycles, and any cycle may consume from zero prediction bits to all m prediction bits in the current prediction vector. The number of prediction bits used in any FHT cycle is equal to the number of branch indications in the xe2x80x9carrangementxe2x80x9d field of the current FHT entry; e.g. 0, 1, 2 and 3 are each branch indications. This variability in the number of vector bits used per FHT cycle depends on the number of no-branch indications in the xe2x80x9carrangementxe2x80x9d field, since all no-branch indications in the xe2x80x9carrangementxe2x80x9d field are skipped by the prediction vector during the matching process. The vector bits are consumed from left-to-right in the current prediction vector, and any unconsumed vector bits become the initial vector bit(s) in the next m bit predicted vector. When all sub-fields in the xe2x80x9carrangementxe2x80x9d field contain no-branch indications (e.g. asterisks, *), none of the prediction bits are consumed in the FHT cycle, and the same vector bits are provided as the prediction vector for the next FHT cycle. An opposite example is when all sub-fields in the xe2x80x9carrangementxe2x80x9d field contain branch instruction indications (0 or 1) in each of its sub-fields 0, 1, 2 and 3, and then the number of prediction bits consumed by the FHT cycle is equal to the total number of sectors in the AIC row. If an end-indicator exists in an the xe2x80x9carrangementxe2x80x9d field, the number of prediction bits consumed by the FHT cycle is equal to the number of sub-fields in the xe2x80x9carrangementxe2x80x9d field containing branch-instruction indications up to the end indication.
This invention ingeniously divides each xe2x80x9cnext IFAR addressxe2x80x9d (provided for the prediction of each FHT cycle) into a set of novel special fields which are used in the operation of this invention. These special fields include an xe2x80x9caddress tagxe2x80x9d field, an xe2x80x9cIFAR set numberxe2x80x9d field, and an IFAR sector numberxe2x80x9d field, which are used in the preferred embodiment for quickly locating a hit FHT entry. The xe2x80x9cIFAR set numberxe2x80x9d field is used as an index in the FHT to locate a FHT set which may contain a FHT entry having a FHT hit. The xe2x80x9cIFAR sector numberxe2x80x9d field is used with a novel Sector Distribution Table (SDT) for quickly locating an AIC sector address in an AIC directory entry for determining an AIC hit or miss during an FHT cycle for an FHT miss. The xe2x80x9caddress tagxe2x80x9d field is used to verify that the SDT entry found by using the xe2x80x9cIFAR sector numberxe2x80x9d field is the SDT entry associated with the xe2x80x9cnext IFAR addressxe2x80x9d.
The xe2x80x9cIFAR set numberxe2x80x9d field, is defined as K number of consecutive bits in the xe2x80x9cnext IFAR addressxe2x80x9d located at the low-order end of its xe2x80x9cmemory line addressxe2x80x9d. (The xe2x80x9cmemory line addressxe2x80x9d is a well known part of each memory address used to locate a corresponding memory line in the computer memory containing a byte being addressed by the entire address.) The xe2x80x9caddress tagxe2x80x9d is defined as the remaining high-order part of the xe2x80x9cmemory line addressxe2x80x9d. The xe2x80x9cIFAR sector numberxe2x80x9d field is comprised of the xe2x80x9cIFAR set numberxe2x80x9d field extended at its low-order end by Q number of bits in its xe2x80x9cnext IFAR addressxe2x80x9d, and 2**Q is the number of sectors in each AIC row.
Hence, these special fields in the xe2x80x9cnext IFAR addressxe2x80x9d are related to the size of the FHT, to the size of the AIC rows, and the number of SDT entries in the SDT is related to the total number of sectors in the AIC. Nevertheless, each FHT set in the FHT may contain an arbitrary number of FHT entries, (even though the number of FHT sets in the FHT is determined by K number of consecutive bits in the xe2x80x9cIFAR set numberxe2x80x9d field. Thus, the number of FHT entries per FHT set may be a single FHT entry or may be a plurality of FHT entries. It is convenient to have the same number of FHT entries in each FHT set in the FHT; for example, the preferred embodiment has four FHT entries per FHT set.
An example of these special fields may be given for a system using 64 bit memory addresses (each address comprised of bits 0 to 63). In this 64 bit address, its bits 0 to 56 comprise its xe2x80x9cmemory line addressxe2x80x9d (for locating and fetching a line of instruction""s located on a line boundary in the computer memory). Then, address bits 57 to 63 may be used by the processor to locate a byte in the fetched memory line, which allows the 64 bit address to locate a byte anywhere in the computer memory. In this memory line address (e.g. bits 0 to 63), the xe2x80x9cFHT set numberxe2x80x9d field is then comprised of the nine bits provided by the low-order bits 48 to 56 in the memory line address, and the xe2x80x9caddress tagxe2x80x9d is comprised of the remaining high-order bits 0 to 47 (or a part thereof which is later explained herein) of the memory line address. Finally, the xe2x80x9cIFAR sector numberxe2x80x9d field is comprised of the xe2x80x9cFHT set numberxe2x80x9d field bits 48 to 56 extended on its low order end by Q bits, so that if Q is 2 (then 2**Q=4 sectors per AIC row) the xe2x80x9cIFAR sector numberxe2x80x9d field is comprised of the address bits 48 to 58 in the 64 bit address.
Each FHT entry contains a valid bit, LRU field, a xe2x80x9csectors outcomesxe2x80x9d field, a xe2x80x9csector arrangementxe2x80x9d field, an xe2x80x9cinitial sector addressxe2x80x9d field, a xe2x80x9cnext IFAR addressxe2x80x9d field, an xe2x80x9cAIC indexxe2x80x9d field and an xe2x80x9cAIC sector positionxe2x80x9d field. The valid bit indicates if the content of the FHT entry is valid; the LRU field indicates when a valid FHT entry was last used in the FHT set; the xe2x80x9cinitial sector addressxe2x80x9d field contains the memory address of the first AIC sector to be outgated in the sector sequence represented in the FHT entry (which may be any sector in the selected AIC row); the xe2x80x9cnext IFAR addressxe2x80x9d field contains the predicted next memory address which is provided to the branch prediction unit; the xe2x80x9cAIC indexxe2x80x9d field locates an AIC row and its corresponding AIC directory entry and associates them with this FHT entry; and the xe2x80x9cAIC sector positionxe2x80x9d field locates the sector position of the sector address in the associated AIC directory entry for verifying if the corresponding segment of instructions in the AIC row should be outgated for execution for the associated, FHT entry. (There may be duplication in the information contained in some of these FHT fields.)
During each iteration by a FHT cycle, a FHT set (containing a plurality of FHT entries) is located by the xe2x80x9cIFAR set numberxe2x80x9d field (in the current xe2x80x9cnext IFAR addressxe2x80x9d), and a search is made in the FHT set of its valid FHT entry. A FHT hit requires a match on each of two fields in a valid FHT entry in the FHT set, including a match between the FHT entry""s xe2x80x9cinitial sector addressxe2x80x9d field and the current xe2x80x9cnext IFAR addressxe2x80x9d, and another match between the FHT entry""s xe2x80x9csectors outcomesxe2x80x9d field and bits in the prediction vector.
When a FHT hit is indicated for a FHT entry in the FHT set by this matching process, the FHT cycle quickly determines if an AIC hit exists. To quickly determine an AIC hit, the processor obtains the xe2x80x9cAIC indexxe2x80x9d and xe2x80x9cAIC sector positionxe2x80x9d fields from the hit FHT entry, and uses them to access the sector address at the indicated AIC sector N in the corresponding AIC directory entry at the indicated AIC index. If the Nth sector address (contained in the indicated Nth sector position in the AIC directory entry) matches the content in the xe2x80x9cinitial sector addressxe2x80x9d field of the FHT entry and the AIC directory entry are valid, an AIC hit is obtained. Then the LRU field of the hit FHT entry is adjusted to reflect this FHT entry is the most recently used entry in the FHT set. After the AIC hit is obtained, the xe2x80x9csector arrangementxe2x80x9d field in the hit FHT entry controls the outputting of instructions in its specified sequence of sector(s) in the selected AIC row, and this sequence of instructions is sent to the processor execution pipeline for execution. The xe2x80x9cnext IFAR addressxe2x80x9d field in the hit FHT entry is sent to the branch prediction unit for making the vector prediction used by the next FHT cycle.
The matching process used to determine a FHT hit in the selected FHT set may be performed sequentially, in parallel, or by a combination of parallel and sequential operations on all FHT entries in the selected FHT set. Parallel matching operations may be done simultaneously on all fields in all FHT entries in the set to provide the fastest FHT hit/miss determination or in parallel on each valid FHT entry in the FHT set. Completely sequential operations are the slowest.
The valid bit states in all FHT entries in the set may be examined first, with the matching process continued on only the valid FHT entries. If no valid FHT entry is found in the set, an FHT miss is indicated. Next, the matching process further examines only the valid FHT entries in the set by matching the current IFAR address with the xe2x80x9cinitial sector addressxe2x80x9d field in each of the valid FHT entries. A mismatch eliminates the respective FHT entry. Then the bits in the current prediction vector are compared to sub-fields in the xe2x80x9csector branches outcomesxe2x80x9d field in each non-eliminated FHT entry. An FHT entry provides a FHT hit if both fields match in any FHT entry in the set.
Thus the overall FHT matching process operates on one or more of three different fields in each FHT entry of the set, which are: the valid bit field, the xe2x80x9cinitial sector addressxe2x80x9d field, and the xe2x80x9csector branches outcomesxe2x80x9d field. All of these three field must have a match for a FHT hit to occur in a FHT entry.
In any branches-outcomes-prediction vector, each vector bit may be set to either a 0 or 1, representing either a branch-not-taken, or a branch-taken prediction in a sequence of branch instructions. Each sub-field in any xe2x80x9csectors branches outcomesxe2x80x9d field may contain one of the following indications: 0 represents a xe2x80x9cbranch not-takenxe2x80x9d indication, 1 represents a xe2x80x9cbranch-takenxe2x80x9d indication and 2 represents a xe2x80x9cno-branch instructionxe2x80x9d indication. Therefore, a match occurs for any xe2x80x9csectors branches outcomesxe2x80x9d field in which all sub-fields contain the xe2x80x9cno-branch instructionxe2x80x9d indication (e.g. 2). Then if this match enables a FHT hit, all corresponding sectors in the associated AIC row are outgated for execution. Then the FHT process continues with the next FHT cycle using a prediction vector based on the xe2x80x9cnext IFAR addressxe2x80x9d field in the FHT entry. (The xe2x80x9cno-branch instructionxe2x80x9d indication is shown as an asterisk in some of the figures herein.)
The FHT matching rules are complex, not straight-forward, and not obvious. Matching by the prediction vector includes complex alignment rules caused by the bits in the prediction vector only representing branch instructions, and the prediction vector bits being matched against sub-fields in a xe2x80x9csectors branches outcomesxe2x80x9d field which may contain sub-fields that do not represent a branch instruction. This causes the prediction vector matching process to use unique dynamic alignment between the prediction vector bits and the sub-fields in the xe2x80x9csectors branches outcomesxe2x80x9d field in order to correctly determine an FHT hit. This alignment process requires the leftmost bit in the prediction vector to be aligned with the leftmost outcomes sub-field having a branch instruction indication, and this requires each next vector bit to skip over any xe2x80x9cno-branchxe2x80x9d sub-field to any next xe2x80x9cbranchxe2x80x9d sub-field in the xe2x80x9csectors branches outcomesxe2x80x9d field, so as to prevent any attempted matching of any vector bit with any xe2x80x9cno-branchxe2x80x9d sub-field. In more detail, each vector bit has a taken or not-taken branch indication and does not have any xe2x80x9cno branchxe2x80x9d indication.
The vector bit matching process ends in any xe2x80x9csectors branches outcomesxe2x80x9d field when any sub-field is detected to contain a xe2x80x9csequence-endxe2x80x9d indication. A match is indicated for a xe2x80x9csectors branches outcomesxe2x80x9d field when matches are found between all of its branch-indicating sub-fields up to any xe2x80x9csequence-endxe2x80x9d indication and corresponding sequential vector bits starting with the left-most vector bit. The matching process ignores any vector bit(s) not matched with any sub-field(s) located before (to the left of) any xe2x80x9csequence-endxe2x80x9d indicating sub-field. Any xe2x80x9coutcomesxe2x80x9d sub-field(s) after (to the right of) any xe2x80x9csequence-endxe2x80x9d indicating sub-field are ignored in the matching process. Hence, a prediction vector may match and obtain an FHT hit, even if all bits in the vector have not been matched with all outcomes sub-fields.
If the initial (left-most) outcomes sub-field(s) consecutively contain no-branch-instructionxe2x80x9d indications (e.g. asterisk), the first vector bit is aligned with the first xe2x80x9cbranchxe2x80x9d sub-field to the right of these xe2x80x9cno-branchxe2x80x9d sub-fields. The rules stated above then determine if a match occurs between the prediction vector and the xe2x80x9csectors branches outcomesxe2x80x9d field. A special case FHT hit is determined if all outcomes sub-field(s) in the xe2x80x9csectors branches outcomesxe2x80x9d field contain xe2x80x9cno-branchxe2x80x9d indicating sub-fields,; and then none of the vector bits are aligned or matched with any of the sub-fields in the FHT entry.
When a FHT hit is determined for a FHT entry, an AIC hit or miss is next determined using fields in the hit FHT entry. This is done by using the content of the xe2x80x9cAIC indexxe2x80x9d and the xe2x80x9cAIC sector positionxe2x80x9d fields in the hit FHT entry to locate a sector in an AIC row and to locate a corresponding sector address in a located AIC directory entry. It is possible that the located AIC row had its sector contents changed and this AIC row no longer contains the initial sector indicated in the hit FHT entry, in which case an AIC miss occurs. Therefore, verification is required that the AIC sector,located by the hit FHT entry is still the AIC sector indicated in the hit FHT entry. This verification process uses the xe2x80x9cAIC indexxe2x80x9d and xe2x80x9cAIC sector positionxe2x80x9d fields in the hit FHT entry as follows: The xe2x80x9cAIC indexxe2x80x9d field is used to locate an AIC directory entry, and the xe2x80x9cAIC sector positionxe2x80x9d field is used to locate an xe2x80x9cN-sector addressxe2x80x9d field in the located AIC directory entry (this xe2x80x9cN-sector addressxe2x80x9d field locates in the computer memory the first instruction of the corresponding AIC sector). Then, this xe2x80x9cN-sector addressxe2x80x9d is compared to the current IFAR address. An AIC hit is determined if these addresses match and the AIC directory entry is valid, because the located AIC row is verified to contain the instruction at the next IFAR address. If these addresses do not match, an AIC miss is determined.
When an AIC hit is determined for a hit FHT entry, the xe2x80x9csector arrangementxe2x80x9d field in the current FHT entry is used to control the outgating sequence of sectors in the associated AIC row in the order specified in the xe2x80x9csector arrangementxe2x80x9d field of the hit FHT entry. The first sub-field in the xe2x80x9csector arrangementsxe2x80x9d field indicates the first sector to be outgated, and each following sub-field in that xe2x80x9csector arrangementsxe2x80x9d field may select the same or any other sector in the associated AIC row to provide any order of sector outgating from the associated AIC row to the processor""s instruction execution pipeline. The instructions in the outgated sectors may be put into an instruction sequence buffer (ISB) in the order of their outgating from the AIC row, and instructions in the ISB are provided to the execution pipeline of the processor for their execution. The outputted sequence may include from one sector to all sectors in the associated AIC row in whatever order is indicated in the xe2x80x9csector arrangementxe2x80x9d field of the FHT entry.
The outgating of a defined sequence of sectors from a hit AIC row requires synchronization between the sub-fields in both the xe2x80x9carrangementxe2x80x9d field and the xe2x80x9csectors branches outcomesxe2x80x9d field of the hit FHT entry. The outgated sequence of segments is defined by the left-to-right order of sub-fields in the associated AIC row. Outgating controls synchronize the selection of corresponding sub-fields in the xe2x80x9carrangementsxe2x80x9d field and xe2x80x9csector branches outcomesxe2x80x9d fields in the hit FHT entry, and sector outgating stops for the FHT entry when any end indicator is reached in the xe2x80x9carrangementsxe2x80x9d sub-field during the synchronized scanning of the sub-fields in both the xe2x80x9carrangementxe2x80x9d field and the xe2x80x9csectors branches outcomesxe2x80x9d field of the hit FHT entry. If the xe2x80x9csectors branches outcomesxe2x80x9d field does not contain any end-indicator, the sector arrangement field controls the outgating of the sectors.
It is to be noted that the AIC index (for selecting an AIC row and corresponding AIC directory entry) may be selected as any available index in the AIC. However, it is convenient in the preferred embodiment to select an AIC index by applying a hashing algorithm to selected bits in the xe2x80x9cinitial IFAR addressxe2x80x9d field of the FHT entry containing the AIC index. This hashing algorithm may select any set of bits from the xe2x80x9cinitial IFAR addressxe2x80x9d field and apply a mathematical operation to these selected bits that computes a number within the range of the indices in the AIC, and this number may be used as the AIC index of that FHT entry. A preferred algorithm evenly distributes the selection of the index numbers within the range of AIC indices for an expected range of IFAR addresses.
An AIC miss generates the first FHT entry associated with the selected AIC row. A FHT miss with an AIC hit generates the second or later FHT entry associated with the located AIC row. A FHT hit with an AIC hit does not generate a new FHT entry.
When FHT cycles are operating with both FHT hits and AIC hits (which is expected over 90 percent of the time), it is essential to obtain instructions from the AIC at a speed faster than can be obtained by conventional branch instruction execution Then, the sectors are accessed and outputted from the hit AIC row in whatever order is specified in the hit FHT entry.
A unique fast way to access a sector located anywhere in an AIC row is disclosed by this specification of a novel Sector Distribution Table (SDT), which is used to locate a valid AIC sector needed for a sequence specified by a hit FHT entry. The xe2x80x9cIFAR sector numberxe2x80x9d field in the current IFAR address is used as an index into the SDT to locate an associated SDT entry, and this SDT entry is tested for associativity with the IFAR address by comparing the xe2x80x9caddress tagxe2x80x9d field in the IFAR address with an xe2x80x9caddress tagxe2x80x9d field in the located SDT entry. If they match their associativity is confirmed, and a sector and its sector address are immediately accessed using an xe2x80x9cAIC indexxe2x80x9d field in the SDT entry to locate the AIC row and the xe2x80x9csector positionxe2x80x9d field in the SDT entry to locate the specified sector position in that AIC row. No time is lost for searching the AIC row or directory entry for the required sector or sector address.
An SDT entry is generated for each sector written into an AIC row in response to an AIC miss. The SDT entry is located in the SDT by the xe2x80x9cIFAR sector numberxe2x80x9d field in the current IFAR address. The xe2x80x9caddress tagxe2x80x9d field in the IFAR address is written into the SDT xe2x80x9caddress tagxe2x80x9d field, the AIC index (determined by hashing the current IFAR address) is written into the xe2x80x9cAIC indexxe2x80x9d field, and the xe2x80x9cAIC sector positionxe2x80x9d field in the SDT entry receives the AIC sector position being written into the AIC row. The SDT entry is then validated. Thus on an AIC miss, a new SDT entry is generated for each sector in the new AIC row, for which a valid sector address is written in the corresponding sector position in the AIC directory entry at the same AIC index.
A replacement control field is provided in each FHT entry, such as a xe2x80x9cLRU (least recently used) bitsxe2x80x9d field for indicating the relative recency of use of the FHT entries in the same FHT set. Each time any FHT entry is accessed, its xe2x80x9cLRU bitsxe2x80x9d field is set to indicate the most recently used state, and the xe2x80x9cLRU bitsxe2x80x9d field in each of the other FHT entries in the same FHT set is set to indicate a less recently used state. Replacement of a LRU entry is necessary when all of the FHT entries in the set are valid, and an FHT entry in the set must be selected for replacement. Then the states of the xe2x80x9cLRU bitsxe2x80x9d field in the FHT set are examined to find a least recently used entry in the FHT set as the replacement entry.
The address of each sequential instruction in a sector is determined by the processor adding the length of each next instruction to the address of the current instruction. When a branch instruction is reached at the end of a sector, the last effective outcomes sub-field for a sector indicates if the instruction is predicted taken or not taken. The target address of each branch instruction begins a new sector.
The Execution Mismatch Controls include a branch information queue (BIQ) which stores: an image of each branch instruction executed in the program, the address of the branch instruction, the address of its target instruction, and the last outcome of the branch instruction (taken or not taken, which is used as the prediction for the branch). When a branch executes, it is determined if its prediction stored in the BIQ is correct or not. If correct, nothing needs to be done. If incorrect and the actual outcome is taken, then the BIQ is corrected when the target address is computed or otherwise obtained from the BIQ, depending on the type of branch instruction. All the information about the last execution of each branch instruction is available in the BIQ, and an indication of where to go next to fetch more instructions. If the prediction is incorrect and the actual outcome is not-taken, then the address is determined for the next instruction, which is stored in IFAR.