Personal computer systems are well known in the art. Personal computer systems in general, and IBM Personal Computers in particular, have attained widespread use for providing computer power to many segments of today's modern society. Personal computers can typically be defined as a desktop, floor standing, or portable microcomputer that are comprised of a system unit having a single central processing unit (CPU) and associated volatile and non-volatile memory, including all RAM and BIOS ROM, a system monitor, a keyboard, one or more flexible diskette drives, a fixed disk storage drive (also known as a "hard drive"), a so-called "mouse" pointing device, and an optional printer. One of the distinguishing characteristics of these systems is the use of a motherboard or system planar to electrically connect these components together. These systems are designed primarily to give independent computing power to a single user and are inexpensively priced for purchase by individuals or small businesses. Examples of such personal computer systems are IBM's PERSONAL COMPUTER AT (IBM PC/AT), IBM's PERSONAL SYSTEM/1 (IBM PS/1), and IBM's PERSONAL SYSTEM/2 (IBM PS/2).
Personal computer systems are typically used to run software to perform such diverse activities as word processing, manipulation of data via spread-sheets, collection and relation of data in databases, displays of graphics, design of electrical or mechanical systems using system-design software, etc.
The heart of such systems is the microprocessor or central processing unit (CPU) (referred to collectively as the "processor.") The processor performs most of the actions responsible for application programs to function. The execution capabilities of the system are closely tied to the CPU: the faster the CPU can execute program instructions, the faster the system as a whole will execute.
Early processors executed instructions from relatively slow system memory, taking several clock cycles to execute a single instruction. They would read an instruction from memory, decode the instruction, perform the required activity, and write the result back to memory, all of which would take one or more clock cycles to accomplish.
As applications demanded more power from processors, internal and external cache memories were added to processors. A cache memory (hereinafter cache) is a section of very fast memory located within the processor or located external to the processor and closely coupled to the processor. Blocks of instructions are copied from the relatively slower system DRAM to the faster caches where they are executed by the processor.
As applications demanded even more power from processors, superscalar processors were developed. A superscalar processor is a processor capable of executing more than one instruction per clock cycle. A well-known example of a superscalar processor is manufactured by Intel Corp. under the trademark PENTIUM. The PENTIUM processor uses prefetch buffers, an instruction cache, and a branch target cache to reduce fetches to the memory, which tend to slow the processor down to less than one instruction per clock cycle.
The instruction cache is a section of very fast memory located within the processor. Instructions execute out of the instruction cache very quickly. Instruction blocks are moved from the slower system memory to the instruction cache, where the processor's buffers, decoders and execution units can quickly access them.
An instruction cache speeds up processing if the next instruction to be executed is within the cache. However, if the current instruction being executed is an instruction that might cause a branch in the code, then the probability that the subsequent instruction is in the instruction cache decreases dramatically and the processor slows down significantly because a branch to an area outside the cache causes the processor to execute code from the relatively slower system DRAM until the cache is loaded with a block of code from the slower DRAM.
The branch target caches, also known as branch target buffers, are a common solution to the problem of branches causing a read from slower system DRAM. Branch target circuits are well known in the art. E.g., Pentium Processor User's Manual, vol. 1, Intel Corp., 1993; Brian Case, "Intel Reveals Pentium Implementation Details," Microprocessor Reports, Mar. 29, 1993, at 9; J. Lee & A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," IEEE Computer, Jan. 1984, at 6.
Branch target caches hold information helpful in avoiding undesirable executions from the slower system DRAM. They hold data such as the predicted target address and history bits that give some indication of whether a particular predicted target address was taken or not taken in the past (in the case of the PENTIUM processor) and the first several lines of code at the predicted address (in the case of the Am29000 processor, manufactured by Advanced Micro Devices, Inc.). Using a branch target cache allows the processor to fetch a portion of the code at the predicted target address allowing the fetcher time to load a larger portion of that area of memory into the instruction cache, thereby allowing continued execution from the cache and preventing unneeded execution from slower system DRAM.
The prefetch buffers are buffers that store a line of code containing one or more instructions. During cache access cycles, if the line of code that is to be executed next is present in the cache, then it is moved into the prefetch buffers for decoding by the instruction decoder. If the line of code that is to be executed next is not present in the cache, then the instruction is fetched from the slower DRAM and loaded into the prefetch buffer for decoding by the instruction decoder.
Using these components, processors achieve very fast execution times. As desirable as the processing capability of these processors are, they tend to consume a large amount of electrical power. The processors access the instruction cache and branch target cache very often to prevent unnecessary accesses to the slower system DRAM; however, the power consumed by a cache is directly proportional to the bandwidth of the cache (the bandwidth is defined as the number of accesses per unit time, for example, accesses per second). Thus, the more often the cache is accessed, the more power the processor consumes. Therefore, the increase in performance caused by the use of the caches is accompanied by an increase in the power consumed by the processor. This consumption of power has numerous detrimental effects including the possibility of causing the entire processor to overheat and fail, thereby leading to irreparable data loss in some cases.
Prior attempts to reduce the power consumed by such processors tend to focus on removing power to certain subcircuits within the processor or stretching the CPU clock (i.e., causing less instructions to be executed per clock cycle than normal). By employing such techniques, manufactures have labeled their personal computer systems as "green" machines, indicating that their computer systems are environmentally desirable because they consume less electrical energy than other similar systems. These attempts to reduce the electrical power consumed are successful but have the undesirable side-effect of reducing power at the expense of reduced processor performance.
Therefore, it is a principle object of the present invention to reduce the amount of electrical power consumed by superscalar processors without causing a similar reduction in processor performance.