FIG. 1 illustrates a prior art data processing system 100 with a main core 102, which processes in order instructions. The main core 102 is limited to a fixed instruction set architecture, such as the MIPS® Instruction Set Architecture (ISA) from MIPS Technologies, Inc., Mountain View, Calif. The system 100 also includes a user execution block 104, which is configured to support user defined instructions. For example, the user defined instructions may be CorExtend™ instructions, a user defined instruction set supported by processors sold by MIPS Technologies, Inc., Mountain View, Calif.
FIG. 1A illustrates the main core 102 communicating with the user execution block 104 via an interface 106. The main core 102 includes a number of pipeline stages 110-124. The user execution block 104 includes a decoder 130 to decode user defined instructions and an execution block 132 to execute the user defined instructions. As shown in FIG. 1A, the execution of the user defined instructions is coordinated through the pipeline of the main core 102. In particular, an instruction cache 110 of the main core 102 passes instructions to a decoder 112 and the decoder 130 of the user execution block 130. Instructions of the instruction set supported by the main core 102 are decoded at the decoder 112. User defined instructions are decoded at the user execution block at decoder 130. The decoded instruction is then dispatched by dispatch unit 114. The operands of the instructions are then read at block 116. If an instruction specifies a bypass operation (e.g., a user defined instruction), the instruction is routed by bypass block 118 to the user execution block 104 for execution in the execution block 132. Otherwise, the instruction is executed in the execution block 120 of the main core 102. Results from execution blocks 120 and 132 are routed to a cache 122 and are then applied to a write state 124. Thus, the user defined instructions are incorporated into the main core processor flow, but are decoded and executed in the user execution block 104.
This is an efficient system when the main core 102 is a simple in order machine or a machine with a short pipeline. If the main core is an out of order execution machine, e.g., a deeply pipelined machine, then the user execution block 104 has to inform the main core 102 about the nature and properties of the user defined instruction. This increases latency as the main core 102 waits for information from the user execution block 104. This also results in standard instruction set instructions being blocked.
In view of the foregoing, it would be desirable to provide an efficient technique for supporting user defined instructions in an out of order processor.