In the field of computer architecture, a single chip may process instructions from multiple instruction sets. In such mixed architectures, the processor hardware is designed and optimized for executing instructions from one instruction set generally referred to as the native instruction set, while emulating other instruction sets by translating the emulated instructions into operations understood by the native hardware. For example, the IA-64 architecture supports two instruction sets—the IA-32 (or x86) variable length instruction set and the fixed-length enhanced mode (EM) instruction set. When executing the IA-32 instruction set, the central processing unit (CPU) is said to be in IA-32 mode. When executing EM instructions, the CPU is said to be in EM mode. Native EM instructions are executed by the main execution hardware of the CPU in EM mode. However, the variable length IA-32 instructions are processed by the IA-32 (or x86) engine and broken down into native EM mode instructions for execution in the core pipeline of the machine. In x86 mode, it is desirable to retrieve instructions from the IA-64 memory subsystem into an x86 engine. To accomplish this, the x86 execution engine must interface with the EM pipeline, because the memory subsystem is tightly coupled to the EM pipeline. The x86 hardware support exists primarily to support legacy software. For this reason, it is desirable that the x86 engine not slow the processing of native instructions in the EM pipeline.
Existing methods of fetching instructions, such as those methods previously implemented in IA-64 architecture, use dual pipelines—the EM pipeline and the x86 pipeline—to process instructions. In these methods, the x86 engine simply sends a fetch address to the EM fetch engine, which accesses the memory subsystem and returns a line of instructions for depositing to a macroinstruction queue (MIQ) in the x86 engine. While both pipelines are synchronized to process the same set of addresses, they operate independently such that the x86 engine sends a new fetch address in each clock cycle, and the EM fetch engine retrieves a new line of instructions in each clock cycle.
In the presence of pipeline stalls (for example due to a cache miss), the pipelines could go out of synchronization. This is because, given the physical separation of the x86 engine and the EM fetch engine it takes one complete clock-cycle to transmit information between these pipelines. In the case of a stall, it is not possible to report the stall to the x86 engine in the same cycle that the fetch engine sees it. That is, the x86 engine would not notice the stall in the EM pipeline until at least one clock cycle after it occurred. Meanwhile, the x86 pipeline continues to advance the fetch address as though no stall had occurred. The x86 pipeline and the EM pipeline become unsynchronized and will process different instructions in corresponding pipeline stages. This requires a complicated stall recovery means to get the pipelines back into synchronization.
Another stall-related problem with existing methods of processing instructions is that there may not be enough room to write a line of returning instructions on the MIQ. That is, existing methods and apparatuses may try to write a new line of instructions to the MIQ, even though the MIQ may be full with unprocessed entries. One prior art method introduces a new stall to recover from this oversubscription to the MIQ. The detection and signaling of this new stall is cumbersome and combined with the earlier fetch-related stalls, requires complicated hardware to handle.
What is needed is a means of interfacing the hardware of a CPU that processes both native instructions and emulated instructions. In particular, what is needed is a method for retrieving instructions of one instruction set architecture (ISA) from the memory of a different, native ISA, while avoiding the problems associated with pipeline stalls and the complexities inherent to the dual, synchronous pipeline system.