Microelectronic manufacturers are continually striving to improve the speed and performance of microprocessors and other processing devices, the performance of such devices being dependent upon many factors. One factor affecting the performance of a processor is the scheduling and execution of instructions associated with a piece of code executing on the processor. Typically, a processor includes an instruction decoder that decodes an instruction to create one or more micro-instructions, or micro-operations, that can be understood and executed by the processor. A micro-operation will also be referred to herein as a “μOP.” Micro-operations ready for execution are provided to a scheduler, which schedules the order of execution of a series of μOPs. Scheduled μOPs are then inserted into an execution stream and subsequently passed to execution circuitry for execution. A processor may also include a checker that determines whether a μOP has been properly executed. If a μOP has been executed, the μOP is retired. If the μOP did not properly execute, the μOP is sent into a replay loop, wherein the μOP is returned to the scheduler and rescheduled for execution.
Access to the execution stream may be provided via a multiplexer or “MUX.” The scheduler output is passed to the execution stream via an input at the MUX, this input often referred to as the “front-door entry” to the execution stream. The flow of μOPs from the scheduler and into the front-door entry of the execution steam—the output of the scheduler including μOPs received from the instruction decoder, as well as μOPs received from the replay loop—may be referred to as the “front-door stream.” A typical processor can execute multiple threads of execution (e.g., two) and, further, is capable of executing instructions out of order. Accordingly, the front-door stream may include μOPs for two or more threads, the μOPs for each thread being out-of-order and interleaved with μOPs of other threads.
A processor may also include a page miss handler (PMH). One task of the PMH is to process specific events—such as, for example, page table misses and page splits—that occur during execution of an instruction or piece of code (i.e., a series of front-door μOPs). When such an event occurs, the PMH will generate a series of μOPs to handle the event. These PMH μOPs are provided to the execution stream via a “side-door entry” into the execution stream, the side-door entry comprising a second input to the MUX. The flow of μOPs from the PMH and into the side-door entry of the execution stream may be referred to as the “side-door stream.”
On each clock cycle of a processor, only one μOP may be passed to the execution stream via the MUX. In other words, during a clock cycle, the execution stream has only one opportunity to receive—or only one “entry slot” for receiving—a μOP, and that entry slot may receive a μOP from only one of the front-door entry and the side-door entry. Therefore, contention for the entry slot of the executions stream will occur whenever a μOP in the front-door stream—i.e., a “front-door μOP”—is “waiting” for entrance into the execution stream and a μOP in the side-door stream (from the PMH)—i.e., a “side-door μOP”—is also seeking entrance to the execution stream. In conventional processors, when a side-door μOP was pending, the entry slot was automatically “awarded” to the side-door μOP and the front-door μOP was discarded, or “whacked.” The whacked front-door μOP was sent into the replay loop and returned to the scheduler for rescheduling. The process of whacking a front-door μOP in favor of a side-door μOP is commonly referred to as “side-door whacking.”
Whacking the front-door μOP irrespective of that μOP's characteristics can add significant latency to the execution of piece of code. Certain μOPs in the front-door stream will have a greater impact on the execution of other front-door μOPs and, therefore, are much more critical to the successful execution of an instruction or piece of code. Thus, the process of automatically whacking a front-door μOP in favor of a side-door μOP whenever contention for the entry slot into the execution stream exists, and irrespective of the criticality of the front-door μOP, may increase latency and inhibit performance. Future generations of processors will be expected to perform multiple processes (e.g., handling a page-table miss, handling a cache miss, handling a page split, etc.) in parallel, and a failure to efficiently share the entrance into an execution stream amongst all processes will result in even greater latencies.