(1) Field of the Invention
The invention relates to a pipelined processor core. More specifically, the invention relates to a versatile processor core which allows use of varied instruction fetch units without modification to the core.
(2) Related Art
Pipelined processing is generally well-known in the art. FIG. 1 shows one such typical pipelined processor. The processor core 1 includes five pipe stages 2-6. Between each stage is an implicit latch (not shown). The information is latched from one pipe stage to the next under the control of the pipe sequencer 7. The first pipe stage is an instruction pointer generator stage (IP) 2 which generates an address at which the desired instruction can be found. This address is then latched to the instruction fetch unit stage (IFU) 3 which physically fetches the instruction. Typically, the IFU has on-board memory of some kind. The on-board memory may be of the form of an instruction cache or it might be read only memory (ROM). The IFU is also coupled to the bus controller 8 which allows the IFU to access the memory 10 in the event that the desired instruction is not found in the on-board memory.
Once the instruction is retrieved either from the on-board memory or externally, the instruction is lathed to a third pipe stage instruction decode unit/register file stage (ID) 4. The ID 4 decodes the instruction and acquires any required operands from a register file. Where each pipe stage tables a single cycle, the ID expects that an instruction will arrive in the third cycle and each cycle thereafter. If the IFU 3 cannot deliver an instruction to the ID 4, it notifies the pipe sequencer 7 that it is stalled while it obtains the expected instruction. The pipe sequencer stalls all stages above. Once the instruction is delivered and decoded, the instruction and operands are then latched to the next stage, the execution unit 5 for execution of the instruction. The final pipe stage is write-back stage 6 in which results of the instruction executed in the execution unit 5 during the previous cycle are written back to the register file or the data cache 9. Branch target path 11 allows the execution unit 5 to inform the IP 2 that a branch has occurred, and provide an address at which execution should continue. In response, the information currently in the pipeline relating to instruction after the branch in the instruction stream is flushed. The pipe sequencer 7 handles common control amongst the pipe stages and insures synchronization. It is critical that the pipe sequencer knows what is occurring in the stages it controls so that synchronization is maintained and retries can be generated when expected data is not received at any particular stage.
In the event of a stall, the stage initiating the stall notifies the pipe sequencer 7, and the pipe sequencer 7 stalls all units above that stage in the same cycle. For example, if the ID 4 is unable to complete decode and register retrieval in the allotted cycle, it issues a stall instruction to the pipe sequencer 7 which immediately stalls the IP 2 and the IFU 3. This stall must be completed within the cycle or, for example, a fetched instruction will be latched to the ID 4 with a new instruction taking its place in the latch. If the ID 4 is unable to accept the instruction latched to it, that instruction is lost. The stall becomes a critical speed path which reduces the maximum frequency at which the core 1 can operate, because the ID may not identify that it does not have, e.g., a needed operand until late in the cycle and because of the number of latches required to be stalled requires time to buffer up of the stall signal to a level able to stall all stages above the stage initiating the stall.
FIG. 2 shows a block diagram of a system of an alternative pipelined processor from the prior art. Processor core 30 contains a plurality of pipe stages. Instruction pointer generation stage (IP) 32 generates an instruction pointer and control information which it places in an instruction pointer latch (IPL) 33. From IPL 33, a plurality of signals including control information and an instruction address are latched into dummy stage 38 which must be the same depth as an instruction fetch unit stage (IFU) 39 which resides outside the core boundary 31. Simultaneously with the latching of the control signals and the instruction address into the control stage 38, the instruction address is latched out on line 43 to IFU 39. IFU 39 fetches the instructions specified by the address and provides it to instruction fetch latch (IFL) 34 inside the core 30. Simultaneously, the control and instruction address arrive from the dummy pipe stage 38. Control information, instruction address, and the instruction are then latched in the instruction decode/register file stage (ID) 35. The instruction is decoded in the ID 35 and the operands retrieved. The decoded instruction and operands as well as control information are latched to execution unit 36. The information necessary for write-back is latched to the write-back stage 37 as in the prior art The latches between the ID 35 and the execution unit 36 and between the execution unit 36 and the write-back 37 unit, are not shown for clarity in the drawing. Restart line 38 is used to inform the IP 32 of the new address and to invalidate the data currently in the pipe when a branch is mispredicted. The execution unit 36 will ignore all data received until it receives the restart signal propagated through with an instruction corresponding to the branch result.
In the event of a stall, for example, in the ID 35, signal line 40 is asserted to stall IFL 34 and is propagated up to also stall IPL 33. If dummy stage 38 and IFU 39 have depths of greater than one, stall signal 40 must be propagated to dummy stage 38 and IFU 39 as well. An analogous situation arises for stalls originating lower in the pipeline. As mentioned before, this results in a critical timing path necessary for buffering up and propagating the signal to achieve the stall without loss of data. Additionally, while IFU 39 is made flexible in terms of the size of memory it may have on board, its depth is dictated by the dummy stage 38 as the two must be of equal depth in order to ensure proper synchronization of the control signals propagating down within the core 30.
Changes in application or desired use commonly necessitate a redesign of the core. Commonly, this is a result of change to the IFU. Among the things that may change as a result of cost or space concerns are size of the on-board memory and the depth of the IFU pipeline. For example, in some applications, it is desirable for cost or design time reasons to use an IFU which requires more than a single cycle to retrieve an instruction. The IFUs are circuit intensive, and the more rigorous the timing requirements for instruction retrieval, the greater the cost and design effort required to insure compliance with that timing. Thus, by expanding the timing allowed, the IFU can be designed using less senior designers and at a lower cost. While moving the IFU 39 outside the core as shown in FIG. 2 allows the size to be changed. Without core redesign, it does nothing to address changes in the depth of the IFU pipeline.
In view of the foregoing, it would be desirable to have a core suitable for use at the large variety of products and flexibly able to incorporate IFUs of varying sizes and depths without necessitating a redesign of the core.