The present invention relates to an instruction synchronization scheme in a processing agent.
Instruction decoding can involve many different processes. For the purposes of this discussion, two different processes shall be distinguished from one another. xe2x80x9cInstruction synchronizationxe2x80x9d refers to the act of identifying the locations of instructions within a string of instruction data. As is known, many processors operate upon variable-length instructions. The length of instructions from the Intel x86 instruction set, for example, may be from one to fifteen bytes. The instructions are often byte-aligned within a memory. A processor typically determines the location at which a first instruction begins and determines the location of other instructions iteratively, by determining the length of a current instruction and identifying the start of a subsequent instruction at the next byte following the conclusion of the current instruction. Within the processor, a xe2x80x9cpre-decoderxe2x80x9d may perform instruction synchronization. All other decoding operations, such as decoding of instruction type, registers and immediate values from instruction data, shall be referred to as xe2x80x9cdecodingxe2x80x9d herein, to be performed by a xe2x80x9cdecoder.xe2x80x9d
FIG. 1 is a block diagram illustrating the process of program execution in a conventional processor. Program execution may include three stages: front end 110, execution 120 and memory 130. The front-end stage 110 performs instruction pre-processing. Front end processing is designed with the goal of supplying valid decoded instructions to an execution core with low latency and high bandwidth. Front-end processing can include instruction synchronization, decoding, branch prediction and renaming. As the name implies, the execution stage 120 performs instruction execution. The execution stage 120 typically communicates with a memory 130 to operate upon data stored therein.
Instruction synchronization is known per se. Typically, instruction synchronization is performed when instruction data is stored a memory in the front-end stage. Given an instruction pointer (xe2x80x9cIPxe2x80x9d), the front-end stage 110 may retrieve a predetermined length of data (called a xe2x80x9cchunkxe2x80x9d herein) that contains the instruction referenced by the IP. The instruction itself may be located at any position within the chunk. Instruction synchronization examines all data from the location of the referenced instruction to the end of the chunk and identifies instructions therein. When the chunk is stored in a memory in the front-end stage, instruction markers also may be stored in the memory to identify the position of the instructions for later use.
Prior instruction synchronization schemes suffer from some performance drawbacks. First, instruction synchronization adds latency because the process must be performed on all data from the requested instruction to the end of the chunk before the requested instruction may be used otherwise. The requested instruction is available to the execution stage 120 only after the delay introduced by the synchronization process. Second, instructions in a partially synchronized chunk may not be available even though they may be present in the front-end memory. A front-end memory may not hit on a request for an instruction in a non-synchronized portion of such a chunk. In response, although the front-end memory may store the requested instruction, the front end 110 may cause the chunk to be re-retrieved from another source and may perform instruction synchronization upon it.
Accordingly, there is a need in the art for an instruction synchronization scheme that avoids, unnecessary latency in the synchronization process.