1. Field of the Invention
The present invention relates to a technique for achieving synchronisation between pipelines in a data processing apparatus.
2. Description of the Prior Art
It is known to provide a data processing apparatus with a main processor that takes the form of a pipelined processor having a plurality of pipeline stages. This enables multiple instructions to be in the process of execution by the main processor at any point in time. During the execution of any particular instruction, that instruction will pass through the various pipeline stages of the main processor, execution of that instruction typically completing when the instruction is processed through the final pipeline stage of the main processor, at which point the status of the data processing apparatus is updated to reflect the result of execution of that instruction. As an example, the contents of one or more registers of a register bank accessible by the main processor may be updated dependent on the result of execution of the instruction.
It is also known to provide a data processing apparatus with one or more coprocessors for executing particular coprocessor instructions that appear in a sequence of instructions to be executed by the data processing apparatus. In situations where the main processor has a pipelined architecture, it is also common for the coprocessor to have a pipelined architecture, and hence for the coprocessor to also have a plurality of pipeline stages through which a coprocessor instruction is processed in order to execute that coprocessor instruction. Typically, each coprocessor instruction is arranged to be routed through both the pipeline of the main processor and the pipeline of the coprocessor. The coprocessor is intended to run more or less in step with the main processor, and accordingly steps have been taken to keep the coprocessor pipeline synchronised with the main processor pipeline.
The need for synchronisation stems from the fact that there is a need for interaction between the various pipeline stages of the main processor and the various pipeline stages of the coprocessor during execution of a coprocessor instruction. For example, coprocessor instructions may be cancelled by the main processor if a condition code specified by the coprocessor instruction is not met, or the entire coprocessor pipeline may need to be flushed in the event of a mispredicted branch that has resulted in the coprocessor instruction being executed. Further, data may need to be passed between the main processor and the coprocessor in the event that the coprocessor instructions define load or store operations.
Up to now, coprocessor pipelines have been kept synchronised with the main processor pipeline by passing signals with fixed timing from one pipeline to the other. These signals mainly cause stalls in one pipeline when the other pipeline stalls, in order to maintain synchronisation. However, there are other complicating factors, for example when the main pipeline needs to cancel the coprocessor instruction, or the pipelines need to be flushed, which significantly complicate the interactions between the main processor and the coprocessor when they interact with stalls. As the length of pipelined processors has increased, it has become more and more difficult to achieve synchronisation between pipelines using this tightly coupled scheme involving the passing of signals with fixed timing between the pipelines.
A major constraint imposed upon the coprocessor interface is that it must operate over a two cycle delay, that is any signal passing from the main processor to the coprocessor, or vice versa, must be given a whole clock cycle to propagate from one to the other, and hence cannot be actioned until the following clock cycle. This means that a signal crossing the interface must be clocked out of a register on one side of the interface and clocked directly into another register on the other side, and no combinatorial process must intervene. This constraint arises from the fact that the main processor (also referred to herein as the processor core) and the coprocessor may be placed a considerable distance apart and generous timing margins must be allowed for to cover signal propagation times. This is particularly true in situations where the coprocessor may be designed separately to the design of the main processor, for example by a different party. This delay in signal propagation makes it difficult to maintain pipeline synchronisation using the earlier described tightly coupled synchronisation technique.
Accordingly, it would be desirable to provide an improved technique for obtaining synchronisation between pipelines in a data processing apparatus.