Under a typical computer system architecture, a memory controller controls access to system memory during read and write cycles. When accessing the system memory, the memory controller processes read and write requests generated by a central processor unit (CPU), requesting data read from or written into a particular memory address. Upon receipt of the CPU requests, the memory controller initiates corresponding read or write cycles over a system bus, for accessing the addressed memory locations. The amount of data transferred during each memory cycle is dependent on the width of the system's data bus and the length of a memory location, which is defined in terms of data bits, for example, 8-bit, 16-bit, or 32-bit.
Because the performance of a computer system is highly dependent on the data throughput between the system memory and the CPU, various techniques have been devised to increase such throughput. These techniques include pipelining and pre-fetching of CPU instructions. For pre-fetching of the CPU instructions, newly arrived instructions are fetched prior to the execution of a previous instruction, thereby increasing execution efficiency. Under pipelining, the CPU instructions are subdivided into smaller sub-tasks, with each sub-task being performed by a corresponding register. For executing an ADD instruction, for example, the ADD instruction must be fetched from the system memory, decoded by an instruction decoder, and processed in an ALU. In order to execute multiple ADD instructions in a pipelined manner, corresponding registers separately perform the fetching function, decoding function and ALU function, thereby performing multiple ADD functions substantially simultaneously.
In computer systems that employ pipeline and pre-fetch execution techniques, situation arises when a read request becomes dependent on a still pending and un-executed write request. This dependency does not pose complications in systems that utilize write priority over the read request. However, in some computer systems, for example, those used in telephony applications where system performance is measured by how fast data can be read from the system memory, read requests have priority over write requests.
Generally, data is read from and written into the system memory in full length (for example, 8-, 16-, or 32-bits). This is true even if a read request or a write request is for partial data that is less than the full length, for instance, when the write request modifies one byte of a 4 byte memory location. Because reads and write cycles are performed on the entire length of the memory locations, conventional systems use a read-modify-write (RMW) cycle to handle such request. Under a RMW cycle, the memory controller reads the entire memory location and partially modifies the data portion specified by the write request, and writes the modified data back into the specified location.
In systems that provide for read priority, a read request may be dependent on a pending un-executed write request that is queued in a path separate from the read request. For resolving the dependency, the memory controller must wait until the write request is executed, before servicing the read request. In some instances, however, the resolution of the dependency may require the execution of a RMW cycle. For example, a read request requiring the reading of a full length of a 32-bit (i.e. 4-bytes) memory location may be dependent on a pending RMW cycle that partially modifies, e.g., one byte, of the same memory location. When the resolution of the dependency requires the execution of a RMW cycle, conventional techniques for servicing the read request must execute two read cycles: one for reading the data before write modification and the other for reading of the data after the write modification.
It is, however, desirable to reduce the number of read cycles in computer systems that resolve dependencies so as to increase data throughput of the system.