1. Field of the Invention
The present invention relates to improving manufacturing yield for microprocessor chips by providing redundant, or spare instruction buffer entries and accurately identifying those entries which are available for use. More particularly, the present invention includes a technique that tests the instruction buffer circuitry and stores the results of these tests in order to ensure that a sufficient number of buffer entries are available to meet the baseline specification of the microprocessor.
2. Description of Related Art
With the continual advance of computer technology, more and more circuitry is being provided on each integrated circuit (IC), which makes them correspondingly more complex. These chips are likely to include millions of transistors and be quite large. It is not surprising that the cost to fabricate these ICs is relatively high and, as the cost increases, the manufacturing yield becomes critical in order for producers of these chips to remain competitive.
Manufacturing yield is essentially the percentage of ICs that meet the design specification relative to the total number of chips produced. Of course, as chip complexity and size increases, the manufacturing yield usually decreases. Further, after a new IC design has been manufactured for a significant period of time, per chip costs often decrease as the fabrication process is tuned and optimized. Thus, in order to stay competitive it is often necessary, if not critical, to increase manufacturing yields especially during the early stages of chip production when the manufacturing costs are highest.
It can be seen that yield on large chips is an important issue and techniques for tolerating small numbers of random defects in the manufacturing process are increasingly more important. While redundancy in caches has been used for some time, it has not been used for other structures in microprocessors. In particular, redundancy has not been used in the microprocessor register file circuitry that is common in microprocessors and whose area contribution in terms of chip real estate is growing. More particularly, the contribution of the register file and instruction buffer to the overall core area of modern microprocessors is increasing to the point where both of these structures can make up approximately ten percent (10%) of the microprocessor core for a total of about 20% of the core area and core functionality. This illustrates the importance in terms of complexity and size of just two microprocessor core structures where redundancy can be used in accordance with the present invention to improve performance.
Modem microprocessors capable of out-of-order execution make use of large numbers of dataflow-oriented instruction buffers for holding instructions while their operands develop. The instruction buffers accept new instructions from the dispatch logic, and coordinate the issue of instructions to various execution units based on operand availability, instruction age and other mechanisms. Typically, these instruction buffers contain content addressable memory (CAM) oriented circuitry that detects when required operand values are available, and either captures a copy of the operand data or sets a flag to indicate that the data is not available in subsequent stages of the execution pipeline (i.e. in a register file that will be accessed as the instruction is allowed to progress).
In general, as instructions are provided to the instruction buffer, they are allocated into unused entries. That is, each entry is the same as every other entry such that a particular instruction can be allocated to any entry in the instruction buffer. However, sometimes, due to limitations in the instruction issue policy, it may be advantageous to allocate new instructions into unused entries that are as close to the xe2x80x9cphysical bottomxe2x80x9d of the instruction buffer stack as possible. Typically, the instruction buffers are xe2x80x9cself allocatingxe2x80x9d, i.e. the instruction is presented to the group of instruction buffer entries and the dataflow-oriented logic surrounding the entries automatically coordinates which of the available entries will receive the instruction. Further, instructions leave the instruction buffer as their dependencies are resolved (independent of their position in the buffer or age) and are xe2x80x9cissuedxe2x80x9d to an execution unit (actually, it is common practice to allow them to linger in the instruction buffer entry for a fixed number of cycles beyond the issue point so that if the instruction is xe2x80x9crejectedxe2x80x9d, it can easily be reissued without having to be re-fetched and re-dispatched).
Dataflow instruction processing as used herein refers to a microprocessor technology wherein resolution of the dependencies associated with the instruction being executed occurs in a continuous manner with a reduced need for pipeline stages, as commonly used in microprocessor technologies. Generally, logic is provided which corresponds to an individual instruction being processed, rather than a functional pipeline stage. In other words, instruction processing focuses more on the individual instruction than a particular pipeline stage. Of course, it should be noted that the scope of the present invention contemplates all types of microprocessors, microcontrollers, embedded controllers, digital signal processors and the like including those having distinct pipeline stages.
In a microprocessor with renamed registers, the machine automatically maps the architecturally defined set of xe2x80x9clogical registersxe2x80x9d into a larger set of xe2x80x9cphysical registersxe2x80x9d to avoid various types of false dependencies and to allow easy purging or speculative results when necessary. As instructions are processed and registers are needed, register allocation/deallocation logic examines the state of the physical register pool, and selects a register that is currently not active, and then marks it as xe2x80x9cin usexe2x80x9d. Later, when the instruction is either completed (or purged), the register deallocation logic frees the register again for future use.
More particularly, most modern microprocessors use rename buffers, or registers. It should be noted that the terms xe2x80x9crename registersxe2x80x9d and xe2x80x9crename buffersxe2x80x9d will be used interchangeably herein. These rename buffers act as temporary storage for instructions that have not completed and as write-back buffers for those that have. To avoid contention for a given register location rename registers are provided for storing instruction results before they are completed and committed to the architected registers. For example, a certain microprocessor may include thirty-two, thirty-two bit general purpose registers (GPRs) which are considered architected registers and twelve, thirty-two bit rename registers for holding results prior to their commitment to the architected registers. Further, rename registers may also be provided for other architected registers, such as two rename buffers for the floating point registers (FPR) and eight rename registers for the condition register (CR).
Generally, when the dispatch unit provides an instruction to the appropriate execution unit (i.e. the integer unit (IU), floating point unit (FPU), load/store unit (L/S), or the like), it allocates a rename register for the results of that instruction. The dispatch unit also provides a tag to the execution unit identifying the result that should be used as the operand. When the proper result is returned to the rename buffer it is provided to the execution unit, which begins execution of the instruction. Instruction results are not transferred from the rename registers to the architected registers until any speculative branch conditions are resolved and the instruction itself is retired without exceptions. If a speculatively executed branch is found to have been incorrectly predicted, the speculatively executed instructions following the branch are flushed and the results of those instructions are flushed from the rename registers.
As an example, conventional microprocessors avoid contention for a given register file location and in the course of out-of-order execution, by providing rename registers for the storage of instruction results prior to their commitment (in program order) by the completion unit to the architecturally defined registers. Register renaming minimizes architectural resource dependencies, namely the output and anti dependencies, that would otherwise limit opportunities for out-of-order execution.
A GPR rename buffer entry is allocated when an instruction that modifies a GPR is dispatched. This entry is marked as allocated but not valid. When the instruction executes, it writes its results to the entry and sets the valid bit. When the instruction completes, its result is copied from the rename buffer entry to the GPR and the entry is freed for reallocation. For load with update instructions that modify two GPRs, one for load data and another for address, two rename buffer entries are allocated.
An instruction that modifies a GPR is assigned one of the twelve positions in the GPR rename buffer. Load with update instructions get two positions since they update two registers. When the GPR rename buffer is full, the dispatch unit stalls when it encounters the first instructions that need an entry. A rename buffer entry becomes available in one cycle after the result is written to the GPR.
Operation of rename buffers associated with other register files such as the floating point register file, condition register file and the like function in a similar manner.
Redundancy and sparing are methods that are known in the art. These techniques supply additional circuit elements, beyond those required for the baseline specification of the IC, to act as spares in the event that certain ones of the original elements prove to be defective.
The use of redundancy in caches has been common for some time, but due to complexity and cycle time considerations has not been used for other structures in microprocessors. The present invention relates to providing redundancy in the instruction buffer file circuitry which, due in part to the data flow oriented trend, is becoming a larger portion of the microprocessor core in terms of physical area and importance.
Typically, with cache redundancy, fuses are provided that are associated with each cache line. As the cache is tested, those fuses associated with lines that test bad can be blown, or opened, and the array access decoder circuitry is modified to note the state of these fuses. The decoder circuitry then xe2x80x9cdecodes aroundxe2x80x9d any bad entries by recognizing an address to a bad cache line and substituting a functional cache line, while maintaining a record of this substitution. The problem with this traditional scheme is that a significant amount of complexity is required in the cache array circuitry provide the address substitution and tracking mechanism. This is undesirable not only in the amount of additional logic circuitry that is required to be implemented in the chip, but also in the amount of cycle time that is required. More particularly, each time the processor tries to access the portion of the cache that tested bad, decode logic must identify the request as being to the bad address and provide a substitute address to a spare cache location where the data can be stored. This decoding and address substitution occurs continuously during the operation of the data processing system. Thus, it can be seen that a significant amount of cycle time can be consumed over and over during system operations as access attempts to bad cache locations are continually processed.
In a microprocessor core, the instruction buffer circuitry is usually considered a critical path, which is very sensitive to cycle time pressure. Thus, the conventional cache redundancy decode scheme cannot be applied to an instruction buffer circuitry environment to solve the problem addressed by the present invention which provides redundant instruction buffer entries without adding complexity or negatively impacting cycle time.
Therefore, it can be seen that a need exists for a mechanism that allows redundant microprocessor instruction buffer entries to allow the baseline specification to be met, even when some of the entries may not be functional, and to allow the control of these entries without adding additional complexity or cycle time pressures to the system.
In contrast to the prior art, the present invention is a mechanism for providing redundancy in the instruction buffer of a microprocessor such that entries which test bad during manufacturing can be tolerated and the baseline specification of the microprocessor can be met.
Broadly, the present invention utilizes the ability of a content addressable memory (CAM) to individually access the entries in a microprocessor instruction buffer to allow additional entries, beyond those called for in the specification to be provided. The entries are tested and those found xe2x80x9cbadxe2x80x9d are identified and avoided by updating instruction allocation logic with a list of the test status of each entry, or setting a xe2x80x9cmanufactured goodxe2x80x9d bit in the entry itself.
The present invention involves a modification to the instruction buffer allocation function (which can either be discrete logic outside the buffer, or dataflow-oriented logic built directly into the buffer (xe2x80x9cmanufactured good bitxe2x80x9d). The modification allows the allocation function to note that one or more entries are xe2x80x9cphysically badxe2x80x9d and cannot be used for any instruction. One way of providing this information to the allocation function is by a list corresponding to the state of a set of fuses that can be set during the manufacturing test process which essentially involves running a set of test patterns that identify which instruction buffer entries are bad and blowing the fuses associated with those entries. A corresponding bit in a scannable latch is then set such that a bit vector having an indicator of the tested state of each instruction buffer entry can then be generated and used to update the instruction allocation logic to prevent any instruction from be provided to a defective instruction buffer entry. Other testing schemes may also be used to build the list of buffer entries, such as power on self test (POST), built in self test (BIST), or the like. In accordance with the present invention, the instruction buffer would be built with more physical entries than required by the design specification of the microprocessor so that, if needed, several buffer entries can be used as spares without affecting the end product performance.
Furthermore, the present invention contemplates a xe2x80x9cmanufactured goodxe2x80x9d bit (M) in each entry in the instruction buffer. Each of the entries also includes a xe2x80x9cvalidxe2x80x9d bit (V) which informs the microprocessor when the entry is valid, i.e. available for use because the instruction previously in that entry has been processed (executed, flushed or the like). When an instruction is fetched from the cache and ready for placement in the instruction buffer control logic will check the xe2x80x9cvalid bitxe2x80x9d and the xe2x80x9cmanufactured goodxe2x80x9d of the entries in the buffer and when both bits indicate that the entry has tested as being operational (M bit) and ready to accept a new instruction (V bit), then the instruction is placed by allocation logic in an available buffer entry
The present invention uses existing instruction allocation functionality to prevent an instruction buffer entry that was manufactured bad to ever be included in a list of entries that are available to receive microprocessor instructions. In this manner redundant or spare entries, above the baseline specification of the microprocessor, can be provided to account for any faulty buffer entries that are present due to a less than 100% instruction buffer manufacturing yield. By using existing logic to prevent those entries that test bad from ever being used, an instruction buffer having sufficient entries to meet the microprocessor specification is ensured, without the adding costly and complex control logic, or requiring sorting of the chips to find those which happen to have been manufactured with instruction buffer entries that comply with the specification.
Therefore, in accordance with the previous summary, objects, features and advantages of the present invention will become apparent to one skilled in the art from the subsequent description and the appended claims taken in conjunction with the accompanying drawings.