1. Field of the Invention
The present invention to circuits, processes, and design structures for microprocessor cache control.
2. Background of the Invention
Whereas the determination of a publication, technology, or product as prior art relative to the present invention requires analysis of certain dates and events not disclosed herein, no statements made within this Background of the Invention shall constitute an admission by the Applicants of prior art unless the term “Prior Art” is specifically stated. Otherwise, all statements provided within this Background section are “other information” related to or useful for understanding the invention.
Modern microprocessors make extensive use of cache memories. In general, cache memories are memories which require less time to access data, either storing or retrieving, than the time require to access data from a larger pool of memory.
Microprocessor cache design is a well-developed art, so the purpose of the following background paragraphs is to establish some terminology. For more details on cache design as it is understood in the art at the time of our invention, it is recommended to refer to a broadly-used text such as “Cache and Memory Hierarchy Design, A Performance-Directed Approach”, but Steven A. Prizybylski (Morgan Kaufmann Publishers, Inc., San Mateo, Calif., copyright 1990).
FIG. 1 provides a general reference model (100) of various microprocessor-related memory structures and the core of a microprocessor (101). This figure is not a schematic, but instead is a functional depiction of access times tacc. It is important to note that cache memories are not software-defined structures, such as software-defined queues or pages, but are generally banks of hardware memory. For this reason, the hardware design committed to silicon during the design phase of a new microprocessor impacts the microprocessor's ability to carry out certain tasks, either positively or negatively. But, as a hardware design, it is unchangeable and becomes a performance feature (or short coming) of a particular microprocessor. This fact, in part, explains the wide variety of microprocessors which are available on the market even today, including reduced instruction set (RISC), Advanced RISC (ARM), and digital signal processors (DSP), to mention a few. Some microprocessors find their optimal use in personal computers, while others find their optimal use in mobile devices (cell phones, PDA's, etc.), and yet others find their optimal application in specialty devices (instrumentation, medical devices, military equipment, etc.).
As such, the central processing unit (CPU), arithmetic logic unit (ALU), or multiplier-accumulator (MAC) represented in FIG. 1 as #101 functionally stands for the calculating and decision making portion of a microprocessor. In some microprocessor designs, this functional portion of a microprocessor may be given a different name, especially to emphasize any special operation or optimized functionality of the portion of the microprocessor.
A microprocessor-based circuit, such as a computer “motherboard”, a “blade server” board, or a circuit board of a mobile device, will usually include a considerable amount of general purpose memory, which we will refer to as “main memory” (105). Main memory is usually not included in the same integrated circuit (IC) with the microprocessor, but instead is usually provided in one or more separate IC devices.
However, main memory is typically relatively slow to access tacc(MM) because very fast access memory is expensive. So, in order to balance cost versus the need for a large amount of main memory, an affordable but slower main memory device is employed.
To improve performance of the microprocessor, a Level 1 cache memory (102) (“L1 cache”) is often included on the same IC as the processor the calculating and decision making portion (101). As such, the access time of the L1 cache is at the same internal fast speed of the processor core itself because there is no additional delay to convert the internal voltages and signals to chip-external voltages and signals such as the microprocessor's external address, control, and data busses. As such, the access time of the L1 cache tacc(L1) is much less that that to the main memory tacc(MM).
Because the extra “gates” employed in the L1 memory are very expensive “real estate” on an IC die, the determination of how many bytes, words, kilobytes, etc., of L1 memory to design into the microprocessor is driven by the types of applications intended for the microprocessor, which includes cost targets, heat and power requirements, size requirements, etc. For these reason, the amount n(L1) of L1 cache is usually much, much less than the amount n(MM) of the main memory.
Many microprocessors also have a secondary or Level 2 of cache memory (“L2 cache”), which is faster to access tacc(L2) than main memory tacc(MM), but slower to access than L1 cache tacc(L1). Similarly, it is usually provided in greater amount n(L2) than L1 cache n(L1), but in greater amount than main memory n(MM). Some L2 caches are “on chip” with the L1 cache and the processor the calculating and decision making portion, and some are off-chip (e.g. in a separate IC). Off-chip L2 cache is often interconnected to the microprocessor using a special external buss which is faster than the buss to the main memory.
Similarly, an even greater amount (than L1 or L2) of memory may be provided in an Level 3 cache memory (“L3 cache) (104), but less than the amount of main memory. And, similarly, the access time tacc(L3) to the L3 cache is greater than that of the L1 or L2 cache, but still considerably faster than the access time to the main memory.
And, additional memory, such as removable memory cards, hard drives, embedded memory on expansion cards (video and graphics cards, network interface cards, etc.) may be provided which we will refer to collectively as “extended memory” (106), which is slower to access tacc(XM) than main memory, but is usually provided in much greater amount n(XM) than main memory.
Thus, two sets of relationships of access time and amount are generally true for these types of memories, where the operator “<<” represents “is much less than”:tacc(L1)<<tacc(L2)<<tacc(L3)<<tacc(MM)<<tacc(XM)  Eq. 1and:n(L1)<<n(L2)<<n(L3)<<n(MM)<<n(XM)  Eq. 2
“Multiprocessing”, “multicore” processing, and “multithreading” are terms which are used commonly within the art of computing. However, their context often dictates their exact meaning. For our purposes of this disclosure, we will use the following definitions:                “process”—a single software program or function being performed by a computer;        “software thread”—a special type of process or part of a process which can be replicated so that multiple, independent copies of the process can be executed, often apparently simultaneously through time sharing or time division multiplexing of a single (or multiple) microprocessors;        “hardware thread”—a division of a processor or core which allows multi-thread threads of execution;        “multithreading”—the act of executing multiple threads on a single microprocessor or among multiple microprocessors;        “multiprocessing”—using two or more CPU's, ALU's, or MAC's within a single computer system to accomplish one or more processes or threads;        “multi-core”—a type of multiprocessor in which the plurality of CPU's ALU's, and/or MAC's are contained within a single IC or on separate IC's which are packaged together in a single package;        “hypervisor”—also referred to as a virtual machine monitor, allows “virtualization” of a computing platform, often a multi-procesor computing platform, such that multiple operating systems may execute applications concurrently on the same computing platform; and        “processing partition”—a portion of computing platform execution time and resources assigned to one of multiple operating systems by a hypervisor.        
As is known in the art, multithreading is often accomplished with operating system functionality which time shares the processor(s) among the multiple thread. And, multiprocessors or multi-core processors can be employed to execute a single process divided amongst the multiple CPUs, or employed to execute multiple threads or processes divides amongst the multiple CPUs.