In a data processing apparatus, such as a pipelined data processing apparatus, a series of serially-connecting processing stages are formed. Between each stage of the pipeline a signal-capture element such as a latch or a sense amplifier may be provided into which one or more signal values are stored.
The logic of each processing stage is responsive to input signals received from preceding processing stages or from elsewhere and generates output signals to be stored in an associated output latch. In a typical pipelined data processing apparatus, the time taken for the processing logic to complete any processing operations determines the speed at which the data processing apparatus may operate. If the processing logic of the processing stages is able to complete its processing operations in a short period of time, then the signals may rapidly advance through the output latches, resulting in high speed processing. However, the system can not advance signals between stages more rapidly than the speed at which the slowest processing logic in a stage is able to perform its processing operations on received input signals and generate the appropriate output signals. This limits the performance of the system.
Some known techniques seek to overcome some of these processing speed limitations. For example, it is possible to advance the driving of the processing stages until the slowest processing stage is unable to keep pace. Also, sometimes it is possible to reduce the power consumption of the data processing apparatus and the operating voltage will be reduced up to the point at which the slowest processing stage is no longer able to keep pace. It will be appreciated that in both of these situations processing errors may occur.
These processing errors occur typically because the output signal to be stored in the associated output latch does not achieve a predetermined stable voltage level for a period of time prior to a clock signal being provided to the latch (known as the set-up period) or that the output signal is not held for a predetermined period after the clock signal is provided to the output latch (known as the hold period).
The change of state of the signal during these errors is transient (i.e. it is pulse like) and a reset or a rewrite of the latch or device causes normal behaviour to resume thereafter. The signal in this transient state is said to be metastable because it fails to achieve a valid logic level for a period of time, but instead hovers at a metastable voltage somewhere between the logic levels, before transitioning to a valid logic level.
In a data processing apparatus which has a memory, it is desirable to perform accesses to that memory as quickly as possible since this has a beneficial effect on processor throughput.
The structure of a memory, such as a single-ported cache, is such that both read accesses and write accesses occur using a common address interface. Data should only be written to the cache (known as committing) when the write access has been confirmed to not contain any errors.
In the case of a write access, if it transpires that the write access is in some way incorrect or invalid then the data stored in the memory may be corrupt. Furthermore, should the signals used in a write access be metastable then the data stored in the memory may be corrupt. These problems can be overcome by adding extra stages to the processing logic which can detect that such an error has occurred due to the presence of this metastability. The metastability determination can then be made prior to the data being committed to memory. The metastability determination is typically performed at system level and takes a number of processing cycles. Hence, the write access may be buffered in a write buffer and only committed some cycles later when it is known that no errors have occurred. It will be appreciated that such an arrangement has a minimal impact on throughput since write accesses with rarely be on the critical path.
However, it is desirable to execute read accesses as soon as possible. This is because read accesses will typically be on the critical path and any latency in executing read accesses will have a detrimental effect on throughput. Accordingly, the pipelined stages prior to the execution stages are typically optimised to process read accesses as quickly as possible. Accordingly, typical fetch and decode stages would normally be optimised to fetch a read access instruction in a single processing cycle and then decode that instruction in a subsequent single processing cycle. This ensures that the execution of the read access can occur at an early stage.
Also, arbitration techniques are provided in order to deal with the occurrence of concurrent read and write access over the common buses, with read accesses being given priority over write accesses. Accordingly, read accesses are performed in preference, with write accesses being placed in the write buffer and postponed until after the write access is confirmed to be error free and no read accesses are outstanding.
It is desired to provide improved techniques for performing data accesses.