The present invention relates to computer systems. In particular, the invention relates to a resource management scheme for caches and buffers.
In general, early microprocessors processed program instructions one at a time. In these early microprocessors, the architectural programming model exposed the atomic nature of instruction execution.
To increase performance, newer microprocessors began overlapping the processing of instructions and executing some parts of the instructions in an order different from the order in which they arrived at the processor. The process of overlapping the processing of instructions is called xe2x80x9cpipeliningxe2x80x9d and microprocessors in which pipelining is implemented are called xe2x80x9cpipelined microprocessors.xe2x80x9d The process of executing instructions in an order different from program order is called xe2x80x9cout of order execution.xe2x80x9d xe2x80x9cProgram orderxe2x80x9d is the order in which a hypothetical non-pipelined processor would execute the instructions. However, the newer processors still maintain the illusion of sequential and atomic instructions in order to maintain the existing programming model.
FIG. 1 illustrates a simplified block diagram of a prior art microprocessor 101 designed to execute the Intel Architecture (IA-32) instructions as defined in Intel Corporation Manual, Intel Architecture Software Developer""s Manualxe2x80x94Vols. I, II and III, published 1997. A next instruction process 110, which is also referred to as an instruction sequencer, is a state machine and branch prediction unit that builds the flow of execution of the microprocessor 101. To support page table virtual memory accesses, the microprocessor 101 includes an instruction translation look aside buffer (ITLB) 112. The ITLB includes page table entries of linear to physical address translations. Usually the page table entries represent the most recently used page translations. Instructions are fetched over a memory bus 124 by a memory controller 115 from a memory 104 for storage into an instruction cache (ICACHE) 114. The ICACHE 114 is physically addressed. Copies of instructions within memory 104 are stored within the instruction cache 114. Instructions are taken from instruction cache 114, decoded by the instruction decoder 116 and input into an instruction pipeline within an out of order core execution unit 118. Upon completion by the out of order core execution unit 118, an instruction is retired by the retirement unit 120. The retirement unit 120 processes instructions in program order after they have completed execution. xe2x80x9cProgram orderxe2x80x9d means the order in which the instructions were received in the out of order core execution unit 118. Retirement processing includes checking for excepting conditions and committing changes to architectural state. That is, the out of order core execution unit 118 executes instructions which can be completely undone before being output by the microprocessor if some excepting condition has occurred which the retirement unit has recognized.
Unfortunately, the illusion of sequential atomic instructions is difficult to maintain in the presence of dynamic code modifications, i.e., self-modifying code (SMC), and operating system maintained TLB consistency. The Intel Corporation Pentium(copyright) Pro solved the problems associated with SMC and software maintained TLB consistence with a property known as xe2x80x9cinclusionxe2x80x9d. In general, xe2x80x9cinclusionxe2x80x9d means that any instruction between the output of a component and the retirement unit in the processor will be in the component either as an instruction or a reference to the instruction.
ICACHE inclusion in this context means that the instruction bytes for any instruction between the output of the ICACHE and retirement will be in the ICACHE. ICACHE inclusion is used in Pentium Pro to perform SMC detection of the Pentium Pro pipeline. The physical addresses of all modifications to memory are afforded to the ICACHE 114 by the out of order core unit 118 on the snoop bus 128. If the addresses found the ICACHE, a hit response is returned to the out of order core unit 118 on the hit/miss bus 126. On a hit, the out of order core execution unit 118 and retirement unit 120 are responsible for flushing the modified instructions. The Pentium Pro maintains ICACHE inclusion using a victim cache. The victim cache is expensive in hardware due to the extra hardware and area required for the hardware to implement the victim cache and the associated control logic.
The Pentium Pro also maintained instruction TLB (ITLB) inclusion by using a serialize on replacement scheme to ensure that any address translation for any instruction between the output of the ITLB 112 and the retirement unit 120 will be in the ITLB 112. The xe2x80x9cserialize on replacement schemexe2x80x9d involves stopping the ICACHE 114 from providing instructions to the out of order core unit 118 and waiting for the retirement unit 120 to finish retiring all the instructions that remain in the out of order core unit 118. While inexpensive to implement and effective at maintaining ITLB inclusion, the serialize on replacement scheme has detrimental impacts on processor performance.
Therefore, an improved method and system for maintaining a macro instruction in a pipelined processor that provides higher performance, uses less hardware and is less complex than existing methods and systems is needed.
Embodiments of the present invention provide a method for maintaining an instruction in a pipelined processor using inuse fields. The method involves receiving a read request for an instruction, sending the instruction in response to the read request and setting an inuse field associated with the instruction to inuse.