1. Field of the Invention
The present invention relates to a data processing apparatus for buffering addresses identifying locations in a memory, and data values to be written to those memory locations. The term xe2x80x98data valuexe2x80x99 is used herein to refer to both instructions and to items or blocks of data, such as data words.
2. Description of the Prior Art
A typical data processing apparatus includes a processor core (or CPU) arranged to execute a sequence of instructions that are applied to data supplied to the processor core. Generally, a memory may be provided for storing the instructions and data (collectively referred to herein as xe2x80x9cdata valuesxe2x80x9d) required by the processor core. Further, it is often the case that one or more caches are provided for storing data values required by the processor core, so as to reduce the number of accesses required to the memory.
Whilst the use of a cache improves the processing speed of the processor core, there is still the requirement for the processor core to read data values from, and write data values to, the memory, and these processes are relatively slow, thereby adversely affecting the processing speed of the processor core.
To alleviate the impact on processing speed resulting from writing data values to a memory, it is known to provide a write buffer that is typically arranged to decouple a cached CPU from the memory, so as to allow the processor bus to complete a write operation to the intermediate write buffer, and for that write buffer to then autonomously perform the write to the memory bus. By this approach, the CPU does not need to wait for the write process to complete before proceeding to execute the next instruction. Further, the write buffer depth can be increased beyond a single register to enable a plurality of CPU data writes to be buffered, for example by using a First-In-First-Out (FIFO) buffer to maintain write transaction ordering.
In general terms, a write buffer presents a xe2x80x9cslavexe2x80x9d interface to a xe2x80x9cmasterxe2x80x9d at its input side, and presents an xe2x80x9cinitiator busxe2x80x9d interface to the memory bus on its output side. The slave interface generally requires address (a), control (c) and write data (d) signals. The control signal will typically include control information such as operand size, protection and access flags. The master interface, for example the interface between the CPU and the processor bus, similarly must source the same address, control and write data information, and may additionally perform funnelling to narrower or wider data bus width.
In a simple prior art write buffer, the slave interface of the write buffer will have a width of xe2x80x9ca+c+dxe2x80x9d bits (for address, control and data bus widths). In such an arrangement, the write buffer storage requirements are:
a+c+d bits wide x number of write buffer slots.
Generally, when developing data processing apparatus, such as integrated circuits, there is a desire to keep the circuit as small as possible. The space that an integrated circuit occupies is at a premium. The smaller an integrated circuit is, the less expensive it will be to manufacture and the higher the manufacturing yield. For this reason, it is clear that the number of write buffer slots provided within the write buffer cannot be increased at will, as the overall size of the integrated circuit must be kept as small as possible.
Whenever the write buffer fills to capacity, the processor stalls on a subsequent write operation until a free slot in the write buffer becomes available. The maximum write buffer depth is application dependent, and is a trade off between chip area, sustainable burst write bandwidth, and the xe2x80x9clatencyxe2x80x9d of the memory, or secondary, bus where a read transaction is blocked until the write buffer has been emptied.
For cached processors and higher bandwidth systems, much of the write traffic is in the form of xe2x80x9cburstsxe2x80x9d (i.e. cache line replacements or stack context saves), where a base address and a fixed or variable number of data words are transferred. However, there will still typically be some non-burst (eg. 8-bit and 16-bit) accesses (eg. character or xe2x80x9cshortxe2x80x9d data).
In such arrangements, the area required by the write buffer may be reduced by separating the address/control paths from the data path so as to provide two logically separate write buffers, one for the address and control signals, and one for the data signals. Since there will generally be less addresses than data values in burst mode operation, then the number of address slots provided in the write buffer can be significantly less than the data slots provided in the write buffer. However, this saving in area to provide fewer address slots is typically traded for more data slots, such that the overall area of the write buffer is optimized for typical usage.
Hence, for such burst mode write buffers, the write buffer storage is:
a+c bits wide x number of address slots
d bits wide x number of data slots
In such an arrangement, an address incrementer is typically required to re-synthesize the burst addresses as the contents of the write buffer are output to memory, and more complex control logic is required to interlock the address and data write buffer reconstruction.
Whilst such an arrangement is clearly advantageous for burst mode write traffic, if there are any non-burst stores (i.e. byte structure access), then the number of address slots becomes a limiting factor, since in this non-burst mode, there will be one address for each data word.
Given that many data processing apparatus typically employ both burst mode and non-burst mode stores to memory, it would be desirable to provide the data processing apparatus with a write buffer that operates efficiently for both burst mode write traffic and non-burst mode write traffic, without having to increase the size of the write buffer with respect to the size of known prior art write buffers.
Viewed from a first aspect, the present invention provides a data processing apparatus comprising: a processor core for generating addresses identifying locations in a memory and data values for storing in the memory; a write buffer for storing the addresses and data values output by the processor core, and for subsequently outputting said addresses and data values to cause the data values to be stored in said memory; the write buffer comprising a plurality of rows, each row being arranged to store an address or data value, and each row having associated therewith a flag field settable to indicate whether that row contains an address or a data value.
In accordance with the present invention, each row of the write buffer is able to store either an address or a data value, an additional flag field is associated with each row, and the flag field is settable to indicate whether that row contains an address or a data value. Hence, in burst-mode, a particular row will be used to store the base address, with the flag field for that row being set accordingly to indicate that an address is contained within that row, and then subsequently the data values forming the burst traffic will be stored in other rows of the write buffer, with the flag fields of those rows being set to indicate that data values are contained within those rows. This approach makes very efficient use of the available write buffer area when buffering burst mode write traffic.
However, it is clear that the arrangement of the present invention also supports non-burst write traffic, where the rows of the write buffer will alternately store addresses and data values, with the flag fields for each row being set accordingly.
It has been found that a write buffer in accordance with the present invention can be arranged to occupy a relatively small area, whilst providing a good compromise between a write buffer optimized for non-burst mode traffic, and a write buffer optimized for burst mode traffic.
In preferred embodiments, each row comprises xe2x80x98nxe2x80x99 bits and the flag field comprises one or more of said xe2x80x98nxe2x80x99 bits. Preferably, said flag field comprises a single bit, since this keeps the space required for the flag field to a minimum whilst ensuring that sufficient information is still provided to determine whether any particular row contains an address or a data value.
In preferred embodiments, the data processing apparatus further comprises a multiplexer for receiving said addresses and data values from the processor core; and input control logic for controlling the multiplexer to output either a data value or an address to the write buffer for storage in a particular row; the input control logic further controlling the write buffer to set the flag field for that particular row to indicate whether that row has an address or a data value stored therein.
Further, in preferred embodiments, each row further comprises a control field, wherein if an address is stored in a particular row, then the control field of that row is used to store control data associated with the address. Hence, in this arrangement, the input control logic will cause the multiplexer to output the address for storing within the particular row, and also the control data for storing within the control field of that row, with the flag field being set to indicate that that particular row contains an address.
Preferably, if a data value is stored in a particular row, then the control field is used to store mask data identifying the region or regions of that row containing data. Hence, the control field is still used, even if the row is being used to store a data value rather than an address. In preferred embodiments, a plurality of bytes in the row are reserved for storing the data value, and the mask data indicates which of said plurality of bytes contain the data value. Hence, if the write buffer is connected to a 32-bit data bus, such that a data word can be up to four bytes long, then four bytes will be reserved for storing the data value in each row. However, if the data value to be stored in a particular row is less than four bytes in length, then not all of the four bytes in the row will be used to store the data value. In this instance, the mask data is used to indicate which of the plurality of bytes in the row do contain the data value. In preferred embodiments, the input control logic is arranged to control the write buffer to generate the mask data.
Further, in preferred embodiments, the data processing apparatus comprises output control logic for controlling the output to the memory of the addresses and data values stored in the write buffer. Preferably, the data processing apparatus comprises a demultiplexer for receiving the contents of a row of the write buffer, the output control logic being arranged to determine from the flag field whether an address or a data value is included in the row, and to instruct the demultiplexer to output a data value onto a data line or an address onto an address line. The input and output control logic may be provided by separate logic components, but in preferred embodiments are provided by the same logic component.
In preferred embodiments, any burst mode stores in the write buffer are resynthesized before passing on to the memory bus. Hence, in preferred embodiments, the data processing apparatus further comprises an incrementer for receiving addresses output on the address line. Thus, if after receiving the address at the incrementer, a plurality of rows of data values are read out from the write buffer, then each time a data value is placed on the memory bus, the address can be incremented by the incrementer, and the corresponding incremented address output on to the address bus of the memory bus. In this way, the memory will receive the necessary address information to enable it to store each data value received.
In preferred embodiments, the demultiplexer is arranged to output onto a control line control data within the row received from the write buffer, and the data processing apparatus further comprises a register for storing the control data. In preferred embodiments, the control data will be output each time a row of the write buffer containing a data value is output on to the memory bus. By storing the control data in a register, this information can be output on to the control bus of the memory bus as required.
In preferred embodiments, the write buffer is a First-In-First-Out (FIFO) buffer, since this ensures that write transaction ordering is maintained.
Viewed from a second aspect, the present invention provides a write buffer for storing addresses identifying locations in a memory and data values for storing in the memory, and for subsequently outputting said addresses and data values to cause the data values to be stored in said memory, the write buffer comprising: a plurality of rows., each row being arranged to store an address or data value, and each row having associated therewith a flag field settable to indicate whether that row contains an address or a data value.