The technical field is a computer system implementing an instruction set architecture using microcoded instructions.
Microcoded computer systems may implement a single instruction set architecture (ISA) for a first and a second computer architecture. For example, a second ISA may be implemented by emulating the second ISA with instructions native to the first ISA. The instructions from the second architecture are often called macroinstructions and instructions from the first architecture are often called microinstructions.
In other cases, a computer system may implement a single ISA even though there is a separate, often hidden, ISA that is used to implement the visible ISA. For example, many current x86 computer processors are microcoded. Users only see the x86ISA. The processors implement the ISA using an xe2x80x9cinvisiblexe2x80x9d ISA that is not known to the users.
Conversion from a macroinstruction to a microinstruction may be accomplished by using one or more large read-only memory (ROM) structures containing the microinstructions needed to emulate the original macroinstruction. Conversion may also be implemented using a random access memory (RAM), a programmable logic array (PLA) and other devices. Expansion of a macroinstruction into one or more microinstructions can be controlled by an instruction sequencer. The set of microinstructions needed to emulate a macroinstruction is called a flow. An entry point into the ROM is typically determined by a large PLA that maps instruction op codes and operand fields to a specific location in the ROM. Once the entrypoint determination logic has provided an entry point or initial address, the instruction sequencer takes over and controls the microcode flow, fetching additional entries from the ROM in a sequential fashion, or, if a microbranch is encountered, branching to the microbranch target. Providing microbranches in the ROM to redirect flow of the microinstructions helps to improve code reuse of the ROM.
Despite efforts to minimize the size of the ROM, multiple processing pipeline stages are often required to read the ROM and obtain the microinstructions. These stages may include decoding of the address into row, column and block selects, driving the selects to the ROM array, driving the selected data out of the ROM array, decoding the microinstruction, and determining if the end of the flow has been reached.
If a redirection or branch is needed, several cycles worth of instructions that have already been read into the pipeline may need to be invalidated, slowing the computing process. One common technique to mitigate this performance degradation is to allow for delayed branches, so that the cycles after the branch can still be used for productive work. However, this technique does not work when the boundary of a macroinstruction is reached. Finishing one flow and starting with another flow appears to the processor to be very much like a branch, except that the branch target is not available as part of the earlier flow.
The common method to overcome this problem involves hinting flows in the same structure that calculates flow entry points, referred to herein as an entry point PLA, although the method can be implemented with other structures that are well known in the art. This hint is used to predict when the next flow should enter the sequencer, avoiding the pipeline delays that would otherwise be required. For instance, a flow that requires only one line of microcode could have a hint of one. When the instruction sequencer sees that the current flow is only one line long, the instruction sequencer advances the next flow into the instruction sequencer in the next cycle without having to decode the instruction.
A difficulty with the above method is that the structure for calculating entry points is often already heavily overloaded. Furthermore, an additional bit is required for every additional length of flow that is hinted. For this reason, this technique often limits the number of unique flows that can be hinted. Other flows may use a marker at the end of the flow, incurring a full pipeline penalty.
A method and an apparatus provides for improving the rate at which macroinstructions are transformed into corresponding microinstructions. In this apparatus, an additional encoding is added to a microcode storage device. The encoding indicates that the flow will end in a determined number of cycles. The number of cycles is determined by the number of canceled instructions, or bubbles, that are introduced if no prediction is used. For flows shorter than this minimum length, a hint in an entry point programmable logic array (PLA) may be used, for example.
Each access of microcode in the microcode storage device produces one or more microinstructions. All microinstructions obtained in a single access of the microcode storage device may be referred to as a line of microcode. A computer system may fetch N additional lines of microcode before being able to decode a line of fetched microinstructions and determine that the end of the flow has been reached. This is so for the following reason. The computer system may need to start the next flow before the first line of microcode in the current flow had been decoded, making it impossible to use information in the microcode itself to redirect the next flow in time to prevent unneeded microcode fetches from being initiated. For flows that are N+1 lines or longer, however, a method for hinting the flow length may involve adding an additional encoding N lines before the end of the flow. The additional encoding indicates the flow will end after N additional fetches. Since the delay from initiating a fetch to decoding the microcode is N cycles, a hint in the microcode N+1 cycles from the end can redirect the flow in time to prevent unneeded fetches from the current flow from being issued. The encoding allows the microcode instruction sequencer to perfectly predict the end of the flow, and to eliminate bubbles in the pipeline that would have otherwise occurred without the hint.
This method and apparatus has the advantage that flows of any length can be hinted. In addition, flows that do not originate from the entry point structure can also be hinted. Finally, fewer hint bits are needed in the entry point structure, but better prediction is obtained. In particular, the number of hint bits is reduced to no more than two for systems with a two cycle microcode storage device lookup delay, a substantial improvement over existing systems.