1. Technical Field of the Invention
This invention relates generally to microprocessors, and more specifically to an improved microprocessor which includes storage into which customer-defined code routines or code segments can be explicitly loaded and held for future execution.
2. Background Art
FIG. 1 depicts an exemplary, conventional microprocessor 10. The microprocessor has an Instruction Set Architecture (ISA) such as X86, MIPS, ARM, Alpha, PowerPC, or the like. Software is written in a source code language such as C++, Pascal, Lisp, or the like, or in the ISA's assembly language, and is then compiled or assembled into native, executable ISA code. The ISA includes the complete set of things which are visible to or expressly usable by the ISA code, including instructions, registers, flags, and the like. The microprocessor typically also has a microarchitecture which is not directly visible to the ISA code, and which is used at a lower level to implement the ISA. Many microprocessors' microarchitectures are microcoded, in that they have their own “native” software format and control constructs. Typically, such microprocessors fetch ISA code, decode it, and generate a corresponding microcode flow to accomplish the functionality specified by the ISA code.
In the example shown, the microprocessor retrieves and executes this ISA code from a memory 12 under control of an instruction fetcher 14. To improve performance, the ISA code is typically stored in an instruction cache 16, and may be speculatively brought in from memory by a prefetcher 18 in coordination with a branch predictor 20. There may also be a separate data cache 22 in some instances. In the context of this invention, “memory” may be DRAM, SRAM, ROM, flash memory, hard disk, CD-ROM, DVD-RAM, or any other form of storage, and may be coupled directly to the processor or it may be coupled indirectly via one or more intervening systems or transmission means.
Regardless of how or when the ISA code is brought into the microprocessor, before it can be executed, an instruction decoder 24 parses the incoming ISA code to ascertain which instructions are contained in the code. In many machines, the instruction decoder generates microcode including a series of one or more microinstructions which correspond to a given ISA instruction. While the ISA code may be thought of as being the “native” instructions of the architecture, the microcode (μcode) is the “native” instructions of the microarchitecture or the execution units 26 in the microprocessor. Two microprocessors may share the same ISA but have wildly different microcode instruction sets.
Some ISA instructions, such as trigonometric math functions, require complex operations, and result in lengthy microcode flows. In many instances, it is beneficial to permanently store these microcode flows in a microcode read-only memory (ROM) 28. When the instruction decoder detects such an ISA instruction, the instruction decoder triggers the microcode ROM to output the corresponding microcode flow.
The microcode from the instruction decoder and/or from the microcode ROM is sent to a microinstruction scheduler 30 which controls the delivery of the microcode instructions to the various execution units of the microprocessor, in accordance with the availability of the execution units, the availability of the required input data operands for the microinstructions (μops), and so forth. Ultimately, the microinstructions are executed and their results are written to the memory (typically through the data cache).
The contents of the microcode ROM are determined by the microprocessor manufacturer at manufacturing time, and cannot be changed by the customer, the ISA-level programmer, nor the end user. So, although the microcode ROM holds code flows for later execution, it cannot be modified nor customized by the customer. A few microprocessors have included the ability for the manufacturer to “patch” the microcode at any time after manufacturing, by loading a sequence of microcode into a microcode patch RAM (not shown). The patch facility is typically used by the manufacturer to work around errata in the microprocessor. The ability to load a patch is among the most tightly protected trade secrets of the manufacturer, with strong encryption protection and verification mechanisms, and is absolutely not exposed to customers or users for their use.
The customer does have some limited control over the contents of the instruction cache. If the customer has a good measure of control over, and knowledge of, all code that may potentially be running on the microprocessor, the customer can to a limited extent control the contents of the instruction cache simply by e.g. making sure that his code (“customer code”) is small enough to fit within the instruction cache without causing evictions and overwriting. However, if other code, such as the operating system, interrupt handler, or another software application suddenly becomes active, it may cause the eviction of the code which the customer wanted in the cache. This will result in degraded performance and, significantly, non-deterministic execution time (both in terms of throughput and latency) of the customer code, when the customer code must be re-fetched into the cache. Some processors allow the cache to be locked, preventing eviction of its contents. In some instances, it may be advantageous for the customer to load the instruction cache with the customer code, and then execute the instruction which locks the cache (typically by setting a bit in a control register). This requires that the customer have a great deal of control over exactly what software is running on the processor. Otherwise, he cannot guarantee that the customer code will, in fact, be present in the cache when he locks it. Also, the code which is to be locked in the cache must be executed in order to be loaded into the instruction cache; merely reading the code would cause it to be loaded into the data cache. The customer cannot load the instruction cache without executing the customer code once. And, once the instruction cache is locked, it cannot be used to improve performance of other code, and overall system performance suffers.
Furthermore, the instruction cache holds ISA code, not microcode. The contents of the instruction cache must be decoded at every execution instance, such as when looping. In a few existing microprocessors, such as the Intel Pentium 4 processor, a “trace cache” holds decoded and loop-unrolled microcode. However, the customer has essentially zero control over the contents of the trace cache.
Microprocessor manufacturers typically do not disclose the format of their microcode to customers or anyone else, and often take extreme measures to prevent others from gaining access to the microcode or writing code in its format.
What is desirable, then, is an improved microprocessor which includes a customer code store which is not subject to the vagaries of cache eviction, which stores pre-decoded microcode which can be fed directly to the execution units without using or being limited by instruction decoder bandwidth, which can be loaded without executing the code, and which permits the customer to control the contents of the customer code store in terms of what algorithms are stored therein, for what time they are so stored, and when that code gets executed.