1. Technical Field
The invention relates to computer processor units and, more particularly, to controlling the fetching of instructions in computer processing units.
2. Description of the Prior Art
Many of today's computer processing units (CPUs) and microprocessors have a Complex Instruction Set Computer (CISC) Architecture. For example, the architectures of the IBM System 390 and the Intel X86 and Pentium microprocessors are classified as CISC Architectures. The instruction set of a CISC architecture system include both simple instructions (e.g. Load, or Add) and complex instructions (e.g. Edit and Load, Start Interpretive Execution or Diagnose).
Conventionally, the complex functions of the CISC architecture are implemented in microcode because building hardware execution units to execute the complex functions is expensive and error prone. Additionally, implementing complex instructions in microcode provides flexibility to fix problems and expandability in that additional functions can be included later. However, as the number of complex instructions implemented in microcode grows larger and larger, the cycle time of the hardware execution units in executing the microcode instructions and the real estate required for the microcode itself becomes a limiting factor in the performance of the processor/CPU.
To solve this problem, designers have implemented millicode routines that execute such complex functions. As shown in U.S. Pat. No. 5,226,164 to Nadas et al., the millicode routines may be stored in a separate millicode array. The millicode routines may include standard system architecture instructions and/or additional hardware instructions that are added to improve performance. The set of additional hardware instructions stored in the millicode array may be considered to be an alternate architecture that the computer processing unit can operate.
Additionally, the millicode routines may be used to emulate a second instruction set architecture. In this case, the additional hardware instructions stored in the millicode array emulate instructions from a second instruction set architecture.
While the above-described system provides increased flexibility and processing speed, it leaves a number of problems unsolved. One problem relates to the manipulation of millicode branch instructions. In many instances, it may be desirable to predict the outcome of the millicode branch instructions such that the proper instruction (next sequential or target) can be fetched before or at least as soon as it is needed, so that no delay occurs in executing the millicode instruction stream.
An effective strategy for predicting the outcome of branch instructions at the system level is embodied in U.S. Pat. No. 3,559,183 to Sussenguth, which is assigned to the assignee of the present invention. It is based on the observation that most branches, considered individually, are consistently taken or not taken and if taken will have a consistent target address. In this strategy, a table of taken branches is constructed. Each entry in the table consists of the address of the taken branch followed by the target address of the branch. This table is a hardware construct and, thus, has a predetermined size, typically from 1024 to 4096 entries. Entries are made only for taken branches as they are encountered. When the table is full, adding a new entry requires displacing an older entry. This may be accomplished by a Least Recently Used (LRU) algorithm as in caches.
In principle, each branch in the stream of instructions being executed is looked up in the table, by its address, and if it is found, its target is fetched and becomes the next instruction in the stream. If the branch is not in the table, it is presumed not taken. As the execution of the branch instructions proceeds, the table is updated accordingly. If a branch predicted to be taken is not taken, the associated table entry is deleted. If a branch predicted not to be taken is taken, a new entry is made for it. If the predicted target address is wrong, the corrected address is entered.
However, using the system level branch history table for both the system level instructions and the millicode instructions is inefficient and may not improve performance due to a number of factors. One of these factors, for example, is based on the observation that millicode instructions will not be executed very often. Thus, when millicode instructions are encountered, most branch entries created in the system level branch history table for the millicode instructions will have been overwritten. An additional factor is that utilizing the branch history table for millicode branches also displaces system level branch instructions. Thus, in some cases, performance may actually be lost by using the conventional system level branch history table to predict millicode branch outcomes.