1. Field of the Invention
The present invention generally relates to microprocessors. More particularly, the present invention relates to a computer capable of running multiple threads that can concurrently process instructions in at least two separate threads. More particularly still, the invention relates to pre-executing at least a portion of a program as separate threads to avoid cache misses.
2. Background of the Invention
Most modern computer systems include at least one central processing unit (“CPU”) and a main memory. Multiprocessor systems include more than one processor and each processor typically has its own memory which may or may not be shared by other processors. The speed at which the CPU can decode and execute instructions and operands depends upon the rate at which the instructions and operands can be transferred from main memory to the CPU. In an attempt to reduce the time required for the CPU to obtain instructions and operands from main memory, many computer systems include a cache memory coupled between the CPU and main memory.
A cache memory is a relatively small, high-speed memory (compared to main memory) buffer that is used to temporarily hold those portions of the contents of main memory which it is believed will be used in the near future by the CPU. The main purpose of a cache is to shorten the time necessary to perform memory accesses, both for data and instructions. Cache memory typically has access times that are several or many times faster than a system's main memory. The use of cache memory can significantly improve system performance by reducing data access time, therefore permitting the CPU to spend far less time waiting for instructions and operands to be fetched and/or stored.
A cache memory, typically comprising some form of random access memory (“RAM”) includes many blocks (also called lines) of one or more words of data. Associated with each cache block in the cache is a tag. The tag provides information for mapping the cache line data to its main memory address. Each time the processor makes a memory reference (i.e., load or store), a tag value from the memory address is compared to the tags in the cache to see if a copy of the requested data resides in the cache. If the desired memory block resides in the cache, then the cache's copy of the data is used in the memory transaction, instead of the main memory's copy of the same data block. However, if the desired data block is not in the cache, the block must be retrieved from the main memory and supplied to the processor. A copy of the data also is stored in the cache.
Because the time required to retrieve data from main memory is substantially longer than the time required to retrieve data from cache memory, it is highly desirable have a high cache “hit” rate. Although cache subsystems advantageously increase the performance of a processor, not all memory references result in a cache hit. A cache “miss” occurs when the targeted memory data has not been cached and must be retrieved from main memory. Thus, cache misses detrimentally impact the performance of the processor, while cache hits increase the performance. Anything that can be done to avoid cache misses is highly desirable, and there still remains room for improvement in this regard.