The invention relates to a data processor according to the precharacterizing part of claim 1.
Such a data processor is known from U.S. Pat. No. 5,649,144. A data processor uses memory and registers for storing data. Access to data from registers is usually performed within one processor cycle, but access to memory is slower, because memory circuits are slower than register circuits and because access to memory requires a cycle for accessing an address. To speed up access to memory, use can be made of cache memory. A cache memory is a small and fast memory, used to store copies of data from a larger and slower main memory. Data which is needed is fetched from main memory into the cache.
U.S. Pat. No. 5,649,144 describes a mechanism for prefetching of data into a cache memory called stream prefetching. The idea underlying stream prefetching is that many of the addresses of data that a processor needs from memory come from a small number of streams of addresses. In each of the address streams the addresses change in a predictable way, for example each time by addition of a fixed number. In stream prefetching a data prefetch unit uses the addresses from such streams to xe2x80x9cprexe2x80x9d-fetch data, that is, fetch data ahead of reception of an instruction that addresses the data for use in the processor. Each time the processor needs data from a new address in a stream, the data prefetch unit computes a next prefetch address from that stream, so that data may be prefetched from the next address in main memory into a cache memory.
Subsequently, when the processor actually needs the data from the next address, the processor executes a load instruction with the next address. When the data has been prefetched into the cache memory, it will be possible to complete this load instruction from the cache memory in a small number of processor cycles; if the data had not been prefetched and also was not otherwise available in the cache memory a larger number of processing cycles would have been necessary to fetch the data from slower main memory.
Although the use of a cache and stream prefetching reduces the delay between the load instruction and the availability of data to a few processor cycles, this delay is still larger than the time needed to access data from registers, which can be done within one clock cycle.
Amongst others, it is an object of the invention to reduce the delay needed to access data from address streams.
The data processor according to the invention is characterized by the characterizing part of claim 1. Thus, the instruction accesses data from selectable FIFO queues, much as if each queue were a further register. The data processor generally has a register file for storing normal operands. The instruction set of the data processor contains at least one further instruction that causes the data processor to effect transport of data from the register file, where the data processor effects transport of data in the same way in response to both the instruction mentioned in claim 1 and the further instruction, except that the instruction mentioned in claim 1 causes the data processor to take data from the FIFO queue instead of from the register file.
The accessed data from the FIFO queue is data that has generally been prefetched in response to an earlier instruction and is therefore usually directly available. Preferably, the latency for accessing the FIFO queue (that is the time needed to access the data) is the same as for normal registers, i.e. one processor clock cycle.
An embodiment of the data processor according to the invention is described in claim 3. In this embodiment the FIFO queue has a full/not full indicator. The present address is updated and data is prefetched from the memory location addressed by the present address when the FIFO is indicated as not full. Extraction of the oldest data will turn a full FIFO into a xe2x80x9cnot full xe2x80x9d FIFO and therefore indirectly causes data to be prefetched. After the initial definition of the stream a certain amount of data will be prefetched until the FIFO is full even before extraction of any data from the FIFO queue.
Preferably, the prefetch of data from the address streams into the FIFO queues is performed through the cache memory. That is, the data prefetch unit issues the present address to the cache memory, if necessary the cache memory loads the data corresponding to the present address from main memory and stores the data in the cache memory. The cache memory supplies the data to the FIFO queue. Thus, in addition to the FIFO, the cache memory also has the data available for relatively fast access before and after the data is extracted from the FIFO queue. In addition to the prefetched data for the queue, the cache may fetch an entire block of say 64 bytes containing the prefetched data for the queue, so that data from that block will also be available for fats access.
The FIFO full/not full signal may be used to control how much data is prefetched into the cache memory. Moreover, without further overhead the data processor is able to use the cache mechanisms for replacement of data, for fetching data from main memory or for providing a copy of the data from the cache memory if this data is already available in the cache memory. No new main memory access conflicts will be introduced by the addition of the FIFO queues.
Preferably, the data processor also has an instruction for starting prefetching according to an address stream. Preferably this instruction tells the processor how to predict the addresses from the address stream and it tells the processor the logical queue number of that address stream and thereby directly or indirectly the FIFO to be used. The addresses may be predicted for example by means of an initial value for a present address of the address stream and a stride value by which the present address may be incremented repeatedly to determine successive addresses from the address stream.