The present invention is related generally to the field of computer graphics, and more particularly, to an embedded memory system and method having efficient utilization of read and write bandwidth of a computer graphics processing system.
Graphics processing systems often include embedded memory to increase the throughput of processed graphics data. Generally, embedded memory is memory that is integrated with the other circuitry of the graphics processing system to form a single device. Including embedded memory in a graphics processing system allows data to be provided to processing circuits, such as the graphics processor, the pixel engine, and the like, with low access times. The proximity of the embedded memory to the graphics processor and its dedicated purpose of storing data related to the processing of graphics information enable data to be moved throughout the graphics processing system quickly. Thus, the processing elements of the graphics processing system may retrieve, process, and provide graphics data quickly and efficiently, increasing the processing throughput.
Processing operations that are often performed on graphics data in a graphics processing system include the steps of reading the data that will be processed from the embedded memory, modifying the retrieved data during processing, and writing the modified data back to the embedded memory. This type of operation is typically referred to as a read-modify-write (RMW) operation. The processing of the retrieved graphics data is often done in a pipeline processing fashion, where the processed output values of the processing pipeline are rewritten to the locations in memory from which the pre-processed data provided to the pipeline was originally retrieved. Examples of RMW operations include blending multiple color values to produce graphics images that are composites of the color values and Z-buffer rendering, a method of rendering only the visible surfaces of three-dimensional graphics images.
In conventional graphics processing systems including embedded memory, the memory is typically a single-ported memory. That is, the embedded memory either has only one data port that is multiplexed between read and write operations, or the embedded memory has separate read and write data ports, but the separate ports cannot be operated simultaneously. Consequently, when performing RMW operations, such as described above, the throughput of processed data is diminished because the single ported embedded memory of the conventional graphics processing system is incapable of both reading graphics data that is to be processed and writing back the modified data simultaneously. In order for the RMW operations to be performed, a write operation is performed following each read operation. Thus, the flow of data, either being read from or written to the embedded memory, is constantly being interrupted. As a result, full utilization of the read and write bandwidth of the graphics processing system is not possible.
One approach to resolving this issue is to design the embedded memory included in a graphics processing system to have dual ports. That is, the embedded memory has both read and write ports that may be operated simultaneously. Having such a design allows for data that has been processed to be written back to the dual ported embedded memory while data to be processed is read. However, providing the circuitry necessary to implement a dual ported embedded memory significantly increases the complexity of the embedded memory and requires additional circuitry to support dual ported operation. As space on an graphics processing system integrated into a single device is at a premium, including the additional circuitry necessary to implement a multi-port embedded memory, such as the one previously described, may not be an reasonable alternative.
Another issue that can further complicate efficient utilization of read write memory bandwidth is implementing an error correction code (ECC) scheme in an embedded memory system. In general, ECCs are used to maintain the integrity of data written to memory, and can, in some instances when an error in the data is detected, correct the errors. In operation, when data are written to memory, a calculation is performed on the data to produce a code. The code, which is stored with the data, is used to detect and correct errors in the data. When the data is read from memory, the code calculation is once again performed on the retrieved data, and the resulting code is compared with the code that was stored with the data. Ideally, the two codes are the same, indicating that the data has not changed since being written to memory. However, if the two codes are different, an error in the data has occurred, and, through the use of the code, a corrected set of data may be produced. Thus, although the data retrieved from memory may have an error, the data that is actually provided to a requesting entity will be correct. In the case the error in the data cannot be corrected by the code, the condition is reported.
The general use of ECC techniques in memory systems is known in the art. For example, use of Hamming codes, Reed-Solomon codes, and the like, for ECC is well understood. Such techniques have been used at various memory levels, including at the embedded memory level. However, these ECC schemes are generally cumbersome and negatively impact memory access rates. In systems where high data read and write throughput is desired, overcoming these issues while maintaining data throughput becomes a daunting proposition.
Therefore, there is a need for a method and embedded memory system having ECC capability that can utilize the read and write bandwidth of a graphics processing system more efficiently during a read-modify-write processing operation.
The present invention is directed to a system and method for accessing a memory array where retrieved data is stored in a memory and upon the writing of the data in its modified form, the originally stored data is updated with the modification prior to being written back to the memory array. In this manner, a new error correction code can be calculated prior to writing the data without the need to access the memory array again. The system includes a memory having a plurality of memory locations for storing data in a first-in-first-out (FIFO) manner, a content addressable memory (CAM) coupled to the memory and having an input to receive memory addresses and having a plurality of memory locations for storing memory addresses, each of which corresponds to a memory location of the memory. The CAM provides an activation signal to access a memory location of the memory in response to receiving a memory address matching the corresponding stored memory address. The system further includes a first switch coupled to the output of the memory to selectively couple the output of the memory to the write bus or an output bus, a combining circuit having a first input, a second input coupled to the output of the memory, and further having an output coupled to the input of the memory, the combining circuit combining data applied to the first and second inputs and providing the result at the output, and a second switch to selectively couple the first input of the combining circuit to the read bus or an input bus. A FIFO control circuit is coupled to the combining circuit, the first and second switches, and the memory. In response to receiving a read request, the FIFO control circuit coordinates the storing of the requested data in the memory and providing the requested data to the output bus, and in response to receiving a write request, the FIFO control circuit coordinates the combining of modified data received from the input bus with corresponding original data previously stored in the memory and providing the combined data for error correction code calculation and writing to the location in the memory array from where the corresponding original data was originally read.