In earlier years the design of the control portion of computer processors have gone through a transition by being converted from hard-wired control units to the more recently types of microcode-driven control units. The microcode is generally referred to as "firmware" and resides at a level below the machine instruction level. The microcode is generally fixed and presented by the manufacturer and is also inaccessible to the user who may not even be aware of its existence.
Microcode instructions must be stored in some type of memory structure which is available to the control hardware of the processor. In many processors, this is a Read Only Memory (ROM) unit which is generally inexpensive and fast, but has the limitation of being fixed and unalterable. Thus when inadequacies are found, or it is desired to change the definition of the instruction set that is implemented, this presents a problem which is very costly to change.
In other types of processors, the microcode instructions are stored in Random Access Memory (RAM). This makes it relatively easy to change the previously fixed type of microcode instructions, but on the other hand, it is much more costly and slower in operation. Additionally, in many VLSI implementations, the Random Access Memory also requires more silicon area per bit, thus reducing the amount of microcode available for use in a given silicon area.
In terms of other practical considerations, both RAM and ROM units are limited in size by certain practical considerations such as power consumption, cost, area required and performance.
With these type of problems presented by RAM and ROM memories, computer systems have been developed with the use of "caching" or cache memory assists in order to provide better service to a processor's need for instruction codes as rapidly as possible.
The present disclosure functions to obtain the benefits of a writable control store without the size constraints of Random Access Memory (RAM) or the lack of unalterability due to Read Only Memory (ROM).
Thus the improved concept that is indicated, is that, instead of attempting to store the entire microcode instruction set in either a RAM unit or a ROM unit, there can be implemented a specialized "microcode cache unit". When a "miss" occurs in an ordinary cache memory unit, the required item is then fetched from the main memory. Most processors are connected to memory systems that are very large compared to the memory space required for microcode storage.
A special problem for microcode cache units is that a cache "miss" is very expensive in terms of average performance. Thus very high "hit" rates are most desirable compared to most general cache applications. It is desirable that hits occur at least 99% of the time in many applications. There are several concepts that make this possible.
(i) First, the amount of microcode actually used in the "normal operation" of a processor is relatively very small. Many OP codes are seldom used, and many esoteric variances of common OP codes are used even less. One obvious example is the action taken under error conditions; PA1 (ii) Second, a microcode post-processor can be used to rearrange the microcode location accessibility to maximize the cache hit rate if the parameters of a caching algorithm and microcode use are known.
Microcode cache operations allow a large, complex, evolving instruction set to be implemented in a single-die package with options as to the whereabouts of the complete microcode in the memory subsystem depending on cost/performance requirements for the system.
Putting the control store off-chip would tend to require deeper pipelining because of the delay incurred. The requirements for computing the address of the next microcode word to be executed would make deeper pipelining of its prefetch very costly. Performance would suffer considerably. Thus the on-chip cache location eliminates much of the pipelining delays incurred if the on-chip caches were not available.
Putting both a general and microcode cache on-chip allows the processor to run for lengthy periods without having to access off-chip. Because of the performance cost of going off-chip (more costly the faster the processor with respect to the memory subsystem), it is desirable to do this as infrequently as possible. Thus, it is useful to implement larger caches as technology allows to further reduce the off-chip traffic.