1. Field of the Invention
The present invention relates to techniques for reducing storage requirements within a data processing apparatus for temporary storage of data.
2. Description of the Prior Art
A data processing apparatus will typically include processing circuitry for executing a sequence of instructions, each instruction being specified by instruction data stored in memory. Often, an instruction cache will be provided in association with the processing circuitry for temporary storage of instruction data that has been retrieved from memory, in order to provide quick access to that instruction data when required by the processing circuitry. By use of such an instruction cache, this can reduce the time taken to process requests from the processing circuitry for instructions, since if the instruction data specifying those instructions already resides within the instruction cache, that instruction data can be provided directly to the processing circuitry without needing to access memory. However, in the event of a cache miss (i.e. where the requested instruction data does not reside within the instruction cache), then it is necessary to initiate a transaction to memory in order to retrieve the required instruction data, this incurring a significant delay due to the latency associated with memory transactions.
Typically an entire cache line's worth of instruction data (including that specifying the instruction of interest) is retrieved when processing such a transaction, with that cache line's worth of instruction data then being stored within the cache. Due to the predominantly sequential nature of instruction execution, such an approach is often beneficial, since there is a high likelihood that subsequent requests from the processing circuitry will then relate to instructions specified by instruction data already residing within that cache line. This hence reduces the number of cache misses and hence the number of stalls incurred whilst waiting for instructions to be returned from memory.
Often, prefetch circuitry is provided within the processing circuitry for issuing requests for instructions in anticipation of them being required for execution by the processing circuitry. This helps to hide the memory latency from the processing circuitry, by seeking to maintain a steady stream of instructions ready for execution by the processing circuitry.
The address space of the memory will often contain cacheable address regions and non-cacheable address regions. Instructions that reside in cacheable address regions are often referred to as cacheable instructions, whilst instructions that reside within non-cacheable address regions are often referred to as non-cacheable instructions. The instruction data specifying cacheable instructions can be stored in the instruction cache since it can be used again in the future. However, this does not apply to instruction data specifying non-cacheable instructions, which is required to be re-fetched from memory if needed again. In particular, for non-cacheable instructions, a cache lookup operation performed within the instruction cache must not generate a cache hit.
In order to get similar performance benefits for pre-fetched non-cacheable instruction code as is available when using an instruction cache to store pre-fetched cacheable instruction code, processors often employ temporary storage buffers for storing instruction data specifying such pre-fetched non-cacheable instructions. However, such buffers are costly in terms of silicon area.
One known approach for providing such buffers is to seek to re-use buffering that already exists for cacheable instructions. In particular, it is common to provide a linefill buffer in which the instruction data relating to a cache line's worth of cacheable instructions can be collated prior to being written into the instruction cache, thereby reducing the number of write operations required in respect of the instruction cache by ensuring that an entire cache line is written in one operation. Such linefill buffers can also be used for the temporary storage of instruction data specifying non-cacheable instructions, but clearly whilst a linefill buffer is used to temporarily store instruction data for non-cacheable instructions, it is not available to be used to collate instruction data for a cache line's worth of cacheable instructions ahead of that cacheable instruction data being written into the instruction cache. Accordingly this can lead to a proliferation in the number of linefill buffers in order to ensure that there will always be at least one linefill buffer available for collating instruction data for cacheable instructions. Such buffers are costly in terms of silicon area, and as the number of instructions that can be prefetched increases, this increases the number of buffers required.
Another option for reducing the silicon area requirements is to limit the amount of instruction code that can be prefetched in advance. This saves some area, but increases the likelihood of stalling the processor due to an inability to prefetch the instruction in advance.
It would hence be desirable to provide an improved mechanism for buffering the instruction data of non-cacheable instructions.