1. Field of the Invention
The present invention generally relates to executing instructions in a processor. Specifically, this application is related to minimizing stalls in a processor when executing load instructions.
2. Description of the Related Art
Modern computer systems typically contain several integrated circuits (ICs), including a processor which may be used to process information in the computer system. The data processed by a processor may include computer instructions which are executed by the processor as well as data which is manipulated by the processor using the computer instructions. The computer instructions and data are typically stored in a main memory in the computer system.
Processors typically process instructions by executing the instruction in a series of small steps. In some cases, to increase the number of instructions being processed by the processor (and therefore increase the speed of the processor), the processor may be pipelined. Pipelining refers to providing separate stages in a processor where each stage performs one or more of the small steps necessary to execute an instruction. In some cases, the pipeline (in addition to other circuitry) may be placed in a portion of the processor referred to as the processor core.
To provide for faster access to data and instructions as well as better utilization of the processor, the processor may have several caches. A cache is a memory which is typically smaller than the main memory and is typically manufactured on the same die (i.e., chip) as the processor. Modern processors typically have several levels of caches. The fastest cache which is located closest to the core of the processor is referred to as the Level 1 cache (L1 cache). In addition to the L1 cache, the processor typically has a second, larger cache, referred to as the Level 2 Cache (L2 cache). In some cases, the processor may have other, additional cache levels (e.g., an L3 cache and an L4 cache).
Processors typically provide load and store instructions to access information located in the caches and/or main memory. A load instruction may include a memory address (provided directly in the instruction or using an address register) and identify a target register (Rt). When the load instruction is executed, data stored at the memory address may be retrieved (e.g., from a cache, from main memory, or from other storage means) and placed in the target register identified by Rt. Similarly, a store instruction may include a memory address and a source register (Rs). When the store instruction is executed, data from Rs may be written to the memory address. Typically, load instructions and store instructions utilize data cached in the L1 cache.
In some cases, a processor may be capable of executing multiple load instructions simultaneously. However, the cache used by the processor core may only be configured to perform a limited number of load accesses at a given time. To prevent the cache from being overloaded by too many load accesses, the processor core may limit the number of load instructions which can be simultaneously executed, for example, by issuing only a single load instruction for execution at a time. Limiting the number of load instructions issued by the processor may reduce the overall speed with which instructions are executed by the processor. Thus, executing multiple load instructions in a processor may decrease processor efficiency.
Accordingly, there is a need for improved methods of executing load instructions.