1. Field of the Invention
This invention relates to the field of memory control.
Portions of the disclosure of this patent document contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever.
2. Background
Computer systems often require the storage and access of large amounts of data. One efficient solution for storing large amounts of data is to use a dynamic random access memory (DRAM) system. Some DRAM systems have multiple memory requesters seeking to access the memory, which can cause contention problems and degrade system performance. This is particularly true in graphics processing systems. The problems of such systems can be better understood by reviewing existing graphics computer and memory systems.
Computer systems are often used to generate and display graphics on a display. Display images are made up of thousands of tiny dots, where each dot is one of thousands or millions of colors. These dots are known as picture elements, or xe2x80x9cpixelsxe2x80x9d. Each pixel has a color, with the color of each pixel being represented by a number value stored in the computer system.
A three dimensional (3D) display image, although displayed using a two dimensional (2D) array of pixels, may in fact be created by rendering of a plurality of graphical objects. Examples of graphical objects include points, lines, polygons, and three dimensional solid objects. Points, lines, and polygons represent rendering xe2x80x9cprimitivesxe2x80x9d which are the basis for most rendering instructions. More complex structures, such as three dimensional objects, are formed from a combination or mesh of such primitives. To display a particular scene, the visible primitives associated with the scene are drawn individually by determining those pixels that fall within the edges of the primitive, and obtaining the attributes of the primitive that correspond to each of those pixels. The obtained attributes are used to determine the displayed color values of applicable pixels.
Sometimes, a three dimensional display image is formed from overlapping primitives or surfaces. A blending function based on an opacity value associated with each pixel of each primitive is used to blend the colors of overlapping surfaces or layers when the top surface is not completely opaque. The final displayed color of an individual pixel may thus be a blend of colors from multiple surfaces or layers.
In some cases, graphical data is rendered by executing instructions from an application that is drawing data to a display. During image rendering, three dimensional data is processed into a two dimensional image suitable for display. The three dimensional image data represents attributes such as color, opacity, texture, depth, and perspective information. The draw commands from a program drawing to the display may include, for example, X and Y coordinates for the vertices of the primitive, as well as some attribute parameters for the primitive, and a drawing command. The execution of drawing commands to generate a display image is known as graphics processing.
A graphics processing system accesses graphics data from a memory system such as a DRAM. Often a graphics processing computer system includes multiple processing units sharing one memory system. These processing units may include, for example, a central processing unit (CPU) accessing instructions and data, an input/output (I/O) system, a 2D graphics processor, a 3D graphics processor, a display processor, and others. The 3D processor itself may include multiple sub-processors such as a processor to fetch 3D graphical drawing commands, a processor to fetch texture image data, a processor to fetch and write depth (Z) data, and a processor to fetch and write color data. This means that multiple memory accesses are being sent to the memory simultaneously. This multiple access can cause contention problems.
The goal of a memory system is to get the highest memory capacity and bandwidth at the lowest cost. However, the performance of a shared DRAM system can be severely degraded by competing memory request streams for a number of factors, including page and bank switches, read and write context switches, and latency requirements, among others.
The data stored in DRAM is organized as one or two-dimensional tiles of image data referred to as memory xe2x80x9cwordsxe2x80x9d. A memory word is a logical container of data in a memory. For example, each memory word may contain eight to sixteen pixels of data (e.g., sixteen to thirty-two bytes).
The DRAM memory words are further organized into memory xe2x80x9cpagesxe2x80x9d containing, for example, one to two kilobytes (K byes) of data. The pages are logical groups of memory words. A DRAM therefore consists of multiple memory pages with each page consisting of multiple memory words. The memory words and pages are considered to have word and page xe2x80x9cboundariesxe2x80x9d. To read data from one memory word and then begin reading data from another memory word is to xe2x80x9ccross the word boundaryxe2x80x9d. Similarly, reading data from one page and then reading data from another page is considered to be crossing a page boundary.
In DRAM memory, it is faster to retrieve data from a single memory word than to cross a word boundary. Similarly it is faster to retrieve data from a single page than to cross a page boundary. This is because peak efficiency is achieved when transferring multiple data values, especially data values that are in adjacent memory locations. For example, for a burst transfer of data in adjacent memory locations, a DRAM may support a transfer rate of eight bytes per clock cycle. The same DRAM device my have a transfer rate of only one byte per nine clock cycles for arbitrary single byte transfers (e.g. those that cross boundaries). Thus, separate accesses to single bytes of data are less efficient than a single access of multiple consecutive bytes of data. Therefore, data in a DRAM memory is typically accessed (written to or read from) as a complete memory word.
The performance cost to access a new memory word from DRAM is much greater than for accessing a data value within the same memory word. Similarly, the cost of accessing a data value from a new memory bank is much greater than from within the same page in the memory bank. Typically, a word in the same page of the same bank can be accessed in the next clock cycle, while accessing a new page can take around 10 extra clock cycles. Furthermore, a new page in a new bank can be accessed in parallel with an access to another bank, so that the 10 extra clock cycles to access a word in a new page in a new bank can be hidden during the access of other words in other pages in other banks.
Access penalties also occur when switching from reads to writes. It is more efficient to do a number of reads without switching to a write operation and vice-versa. The cost in cycles to switch from a write operation to a read operation is significant. Because typically DRAM data pins are bidirectional, that is, they carry both read and write data, and DRAM access is pipelined, that is, reads occur over several clocks, then switching the DRAM from read access to write access incurs several idle clocks to switch the data pin direction and the access pipeline direction.
Certain memory requesting processors have specific bandwidth and latency requirements. For example, CPU accesses and requests have low latency requirements and must be satisfied quickly for overall system performance. This is because the CPU typically reads memory on a cache miss, and typically suspends instruction execution when the instructions or data not in the cache are not available within a few clock cycles, and can only support a small number of outstanding memory read requests. Consequently, CPU performance is latency intolerant because CPU execution stops soon after an outstanding memory request. Other memory requesters may have high bandwidth requirements but may be latency tolerant.
Another problem that arises from graphics processing systems is the tendency to have frame buffers that interleave data across DRAM pages and memory banks. This creates situations where boundary crossings are likely to be increased, decreasing memory access efficiency. This is because graphics working data set sizes are the entire graphical image, typically 1 to 8 megabytes (M bytes) in size, and consequently DRAM page locality of 1 to 2K bytes cannot be maintained, so it is better for graphics processing to efficiently access few words in many pages than many words in few pages. Interleaving graphical image memory words across DRAM pages and banks can amortize the cost of new page and bank access over many word accesses.
In accordance with the present invention, memory accesses are reordered to improve efficiency. A memory controller is used to arbitrate memory access requests from a plurality of memory requesters. Reads are grouped together and writes are grouped together to avoid mode switching. Instructions are reordered so that page switches are minimized. In one embodiment, reads are given priority and writes are deferred. In accordance with the present invention, a memory request controller is provided in a combined CPU and graphics processing architecture. The memory accesses in the invention come from different xe2x80x9cmastersxe2x80x9d. Each memory requester (referred to as a xe2x80x9cmasterxe2x80x9d) provides memory access request into its own associated request queue. The master provides page break decisions and other optimization information on its own with that information being provided in the queue. The masters also notify the memory controller of latency requirements they may have. The memory controller uses the queue and page break decisions to provide appropriate reordering of the requests from all of the request queues for efficient page and bank access while considering latency requirements. The result is improved overall memory access.