1. Background Field
The present invention relates to processing units and in particular to load store units.
2. Relevant Background
Processors, such as microprocessors, digital signal processors, and microcontrollers, are generally divided into many sub-systems, such as a memory system, a processing unit, and load store units. The load store unit transfers data between the processing units and the memory system. Specifically, the load store unit reads (i.e. loads) data from the memory system and writes (i.e. stores) data to the memory system.
FIG. 1 shows a simplified block diagram of a load store unit 110 coupled to a memory system 140. Load store unit 110 includes an instruction decoder 111, a load scheduler 113, a load pipeline 115, outstanding load miss buffer 116, a store scheduler 117, format block 118, and a store pipeline 119. Memory system 140 includes a level one cache 142 and a level two memory sub-system 144. In various embodiments of memory system 140, level two memory sub-system 144 may include additional cache levels in addition to the main memory. In some processors, instruction decoder 111 may be part of another subsystem. Instruction decoder 111 decodes the program instructions and sends load instructions to load scheduler 113 and store instruction to store scheduler 117. Other types of instructions are sent to appropriate execution units, such as a floating point execution unit, or an integer execution unit. In most systems with multiple processing units, each processing unit includes a separate load/store unit. Store scheduler 117 schedules the store instructions and issue store instruction to store pipeline 119. Store pipeline 119 executes the store instruction and stores the data from the store instructions into memory system 140.
Load scheduler 113 schedules the load instructions and issue load instructions to load pipeline 115 for execution. Load pipeline 115 executes the load instructions and reads the requested data from memory system 140. In many load store units, load pipeline 115 includes a load execution pipeline and a load result pipeline. The load execution pipeline decodes the address and accesses the level one cache to determine if the data is in level one cache 142. If the requested data is in level one cache 142, the load result pipeline retrieves the data from level one cache 142. If the requested data is not in level one cache 142 (i.e. a cache miss), the load pipeline is stalled until the data becomes available. However, load/store units that include an outstanding load miss buffer as in FIG. 1, allow other load operations to proceed while one or more missed load instruction is outstanding. The missed load instructions are stored in outstanding load miss buffer 116. Load instructions stored in outstanding load miss buffer 116 are re-issued when the data becomes available in level one cache 142. However, because the address has already been decoded, the re-issued load instructions can be re-issued directly to the load result pipeline.
When data is retrieved from level one cache 142, format block 118 formats the data to conform to the data format requested by the load instruction. For example, format block 118 would reduce the cache line of data from level one cache 142 to the subset of data requested by the load instruction. Format block 118 could also perform other formatting operations such as data alignment and endian conversion.
While the simplest way to issue load instructions is to issue the load instructions in order, greater performance may be achieved by issuing load instructions out of order. For example, if load scheduler 113 receives load instruction L 1, followed by load instruction L 2, followed by load instruction L—3 and load instruction L—1 has unresolved dependencies, load scheduler 113 may issue load instruction L—2 prior to load instruction L—1 rather than stalling and waiting for the dependencies of load instruction L—1 to resolve. Furthermore, load scheduler 113 may also issue load instruction L—3 while waiting for the dependencies of load instruction L—1 to resolve.
Furthermore, load instructions that miss in level one cache 142 are held in outstanding load miss buffer 116, so that later load instructions can issue. For example if load instruction L—1 misses level one cache 142, i.e. load instruction L—1 requests data that is not already in level one cache 142, then load instruction L—1 is stored in outstanding load miss buffer 116. This allows load scheduler 113 and load pipeline 115 to issue later load instructions such as a load instruction L 2. For clarity, load instructions that miss level one cache 142 are referred to as missed load instructions. Typically, outstanding load miss buffer 116 can hold several missed load instructions, including missed load instructions for the same memory location. Once the cache line that is requested by the missed load instruction becomes available in level one cache 142, the missed load instructions can be reissued to load pipeline 115.
When the missed load instructions are reissued, various hazards may occur if the missed load instructions are reissued out of order. For example, if a missed load instruction L—1 and a missed load instruction L—3 (which should come after missed load instruction L—1) are to the same memory location, if missed load instruction L—3 is reissued before missed load instruction L—1, and a store instruction modifies the memory location after execution of missed load instruction L—3 and before the execution of missed load instruction L—1, then the data retrieved by missed load instruction L—1 and missed load instruction L—3 may be inaccurate. Because load scheduler 113 may have issued the missed load instructions out of order, outstanding load miss buffer 116 may not be aware of the actual order of the missed load instructions. Therefore, data hazards from out of order reissuance of the missed load instructions can occur. Accordingly, the tracking system to monitor all loads and store instruction to detect hazards caused by instructions that were issued out of order is further complicated by missed load instructions.
Hence there is a need for a method and system to eliminate potential data hazards when missed load instructions are reissued without using extensive resources.