1. Field of the Invention
This invention relates to computer systems and, more particularly, to power management within a processor of a computer system.
2. Art Background
As modern computer programs have become increasingly more sophisticated, modern personal computer systems have also had to become more sophisticated in order to accommodate these computer programs. Computer programs are larger than they once were and therefore are made up of a larger number of code instructions than were previous computer programs. Furthermore, on average, modern computer programs require access to larger files of data that are read from, and written to, when executing the programs.
Data and instructions are typically stored within the computer system and provided to the microprocessor over one or more relatively fast bus systems. Because most types of relatively fast random access memory are both volatile and relatively expensive, a computer system usually stores code and data on relatively inexpensive, nonvolatile memory such as a floppy disk or a hard disk. The nonvolatile memory has a relatively slow access speed, however, so the typical computer system also has a main memory comprising volatile memory that has a relatively faster access speed.
When a program is to be executed, the computer system uses a technique known as shadowing to copy the code and data required to execute the program from the slow nonvolatile memory to the faster volatile memory. The shadow copy in the main memory is then used to execute the program. If any changes are made to the shadow copy during the course of the program execution, the modified portion of the shadow copy is copied back to the slower nonvolatile memory. Typically, it is only the data (and not the program itself) that changes and is copied back to the nonvolatile memory.
The heart of a personal computer system is usually a central processing unit (CPU) that resides on a microprocessor chip. New microprocessor chips that operate at increasingly high operating speeds are constantly being developed in order to permit personal computers to execute the increasingly larger programs in a timely manner. Usually, these microprocessor chips are developed using CMOS (complementary metal-oxide semiconductor) technology. CMOS chips are characterized by their relatively low power consumption. The greatest amount of power consumption within a CMOS chip occurs on the leading and trailing edges of clock pulses (i.e., when a clock signal transitions from a low voltage state to a high voltage state, or vice versa). When the operating speed of the microprocessor is increased, the number of clock pulses in a particular time period also increases thereby increasing the power consumption of the microprocessor during this time period. Furthermore, as the power consumption of the microprocessor increases, additional heat is generated by the microprocessor. This additional heat must be dissipated in order to prevent heat related damage to components within the computer system.
Both power consumption and heat dissipation pose serious problems when designing a personal computer system. This is especially true in the case of mobile computers that typically are powered by batteries. The more power that the computer consumes, the less time that the computer can operate using a given sized battery. Therefore, as the operating speed of the computer is increased, a designer of a battery powered computer system is faced with several unattractive alternatives. If the same sized batteries are used, then the effective operating time for the computer system must decrease when the operating speed is increased. On the other hand, it is desirable for the effective operating time to remain constant (or, better yet, to be increased). In such a case, one must either add additional batteries, thereby increasing the bulk and weight of the computer, or use an exotic, and therefore expensive, battery technology (or both).
The trend in mobile computers is towards smaller, faster, less expensive and lighter units. Thus, the need to add additional batteries, or to add more expensive batteries, is a significant disadvantage. This disadvantage is exacerbated by the need to add cooling fans, or to implement other cooling techniques, in order to dissipate the additional heat that is generated by a high speed microprocessor.
Additionally, when a microprocessor operates at a higher speed, it can execute more instructions in a given amount of time. Therefore, the microprocessor can also process a greater amount of data during that period. This means that computer instructions and data must be supplied to the microprocessor chip at increasingly greater speeds for the higher speed of the microprocessor to be utilized effectively. Thus a bottle neck has developed in computer systems having fast microprocessors. This bottle neck is the bus that provides instructions for the microprocessor to execute and that also provides the data that the microprocessor will use when executing these instructions.
If the next instruction to be executed is not available when the microprocessor needs it, then the microprocessor must wait idly while the required instruction is retrieved and provided to the microprocessor. Typically, the microprocessor dock continues to toggle during this idle time, thereby needlessly consuming power and generating heat that must be dissipated. This idling can also occur, even when the microprocessor has the next instruction to be executed available, if the next instruction to be executed requires data that are not immediately available to the microprocessor. Once again, the microprocessor must wait one or more clock cycles (i.e., insert wait cycles) until the data are retrieved before the next instruction can be executed.
In order to decrease the frequency with which the microprocessor encounters these wait cycles, many modern high performance microprocessors have a small internal cache. The internal cache is also sometimes called a primary cache. Instructions that are likely to be executed, and data that are likely to be needed by the executing instructions, are stored in the internal cache so that they may be accessed immediately by the CPU of the microprocessor. Frequently, high speed microprocessors have two internal caches: an instruction cache for storing instructions and a data cache for storing data.
An instruction cache works according to the principle of localization. The sequential nature of computer programs is such that when a particular instruction within the program is executed, it is highly probable that the next instruction to be executed will be the instruction that follows the currently executing instruction. Therefore, when an instruction is to be executed, the instruction cache is checked to determine whether a copy of the required instruction is immediately available within the cache. A cache hit occurs if a copy of the required instruction is stored within the instruction cache. If there is a cache hit, then there is no need for the CPU to wait while the instruction is retrieved from wherever it is stored in the computer system. The copy of the instruction can be supplied to the CPU immediately from the instruction cache.
On the other hand, a cache miss occurs if a copy of the required instruction is not stored within the instruction cache. In the case of a cache miss, the CPU must wait while the instruction is retrieved from wherever it is stored within the computer system. Actually, rather than only retrieving the next instruction to be executed, a cache line is formed by retrieving the next instruction to be executed and a certain number of instructions following the next instruction to be executed. This is done because there is a high probability that the subsequent instructions will be executed. Then, if the subsequent instructions are in fact required to be executed, they will be immediately available to the CPU from within the cache line of the instruction cache. Of course, if every line of the cache is full when a new line is retrieved, the new line will replace one of the lines currently stored within the instruction cache. Several cache line replacement schemes exist, typical of these is the least recently used (LRU) cache line replacement method.
A data cache works similarly to the way that an instruction cache operates. Because of the sequential nature of programs, the concept of localization also applies to data used by the programs. If a piece of data is required by the CPU, there is a high probability that the next piece of data required by the CPU will be the piece of data stored immediately following the currently required piece of data. Therefore, if a cache miss occurs in the data cache, a cache line (that contains the currently required piece of data and a certain number of pieces of data that follow it) is retrieved and stored in the data cache. Thus, there is a high probability that the next piece of data required will be stored in the new cache line and a cache hit will occur.
Another way to increase performance of a processor is to provide a floating-point unit (FPU) to supplement the CPU. The FPU is specialized circuitry that performs calculations using floating point numbers, as opposed to integers (whole numbers). Adding a FPU to a microprocessor can dramatically speed up math and graphics functions (graphics work is generally math intensive). The performance is only enhanced, however, in the case of programs that are designed to recognize that the microprocessor has a FPU and then issue floating point instructions to utilize the FPU. Many microprocessors, however, do not have a FPU. Therefore, many programs do not attempt to exploit the benefits of a FPU by including floating point instructions. When these programs are executed by a microprocessor that has a FPU, the FPU is idle because it does not have any floating point instructions to process. Even in the case of a program that uses floating point instructions, the FPU can be idle if there are sections within the program where no floating point instructions are issued because none are required. Even though the FPU is idle, it continues to be clocked and therefore continues to consume power and generate excess heat.
Similarly, there can be times when a processor is active, but a cache is not required. At such times, power is still provided to the cache.