The invention relates to memory devices and in particular to memory devices in which variable, overlapping groups of storage units can be accessed.
Typical DRAM memory is accessed using sequential row and column operations, typically referred to as RAS (row address strobe) and CAS (column address strobe) operations. RAS operations specify row addresses, and CAS operations specify column addresses to select columns within the previously addressed rows.
FIG. 1 illustrates pertinent components of a typical DRAM memory device 10. DRAM 10 comprises a plurality of memory arrays 12, each having a plurality of memory storage units (represented as squares within arrays 12). In this simplified example, there are eight rows of memory storage units. There are six columns of storage units within each array. Each storage unit comprises one or a plurality of individual memory cells.
Memory device 10 has a row decoder 14 that receives a row address during a RAS operation. The row decoder is sometimes referred to as an xe2x80x9cXxe2x80x9d decoder.
The row address specifies a particular row of storage units. The RAS operation causes this row of storage units to be read into sense amplifiers (not shown). The same row is typically read from each of the multiple memory arrays 12. In FIG. 1, a row of storage units is highlighted to indicate that this row has been selected by row decoder 14.
For purposes of discussion, the storage units are labeled with identifiers comprising an alphabetic character with a numeric subscript. The alphabetic character indicates the array in which the storage unit resides, and the subscript indicates the column within the array. For example, storage unit B3 is the storage unit at column 3 of array B.
The DRAM device 10 also has column decoders 16, which are also sometimes referred to as xe2x80x9cYxe2x80x9d decoders or lane decoders. In this example, there is a column decoder associated with each of the four memory arrays 12. The column decoders correspond to data I/O lanes 18 through which data is communicated to and from memory device 10. Each data I/O lane comprises a number of individual I/O lines corresponding to the number of memory cells in each data storage unit. For example, each I/O lane might be a thirty-two bits in width. Combined, four I/O lanes of this width would allow 128 bits or 16 bytes of parallel data access.
The column decoders receive a column address that is specified during a CAS operation. Each column decoder is responsive to the specified column address to generate a column select signal (not specifically shown in FIG. 1) that selects a column of storage units from the row that was previously selected during a RAS operation. In the example shown, the specified column address has resulted in a column select signal corresponding to column 2xe2x80x94this is illustrated by the vertical line extending downward from the selected row and column within each of arrays 12.
In response to a column selection during a CAS operation, the column decoders transfer data from the selected storage units to or from I/O pins or connectors corresponding to the individual bit lines of the data lanes 18.
The data contained in a single row, which is specified during a RAS operation, is sometimes referred to as a page. Once a RAS operation has been completed, it is possible to complete multiple subsequent CAS operations to read various portions of the specified row or page, without the necessity of intervening RAS operations. Each CAS operation is carried out with a specified column address, and each column address corresponds to a unique set of storage units. In the example discussed above, where there are four data lanes of 32 bits each, each column address corresponds to a unique 16 bytes of information that can be read from or written to the memory device in parallel.
Note that some memory devices contain multiple banks of storage cells that may or may not share row and column decoders, although each bank does have dedicated sense amplifiers.
FIG. 2 shows an entire row or page of storage units 20, delineated by CAS boundaries that define the unique sets or groups of storage units that can be accessed during any given CAS cycle. With a CAS address of 0, the column decoders 16 of FIG. 1 select the first column of each memory array and transfer information to or from the storage units of those columns. With a CAS address of 1, the column decoders 16 select the second column of each memory array. For each CAS address, the lane decoders select a corresponding unique set or group of the storage units. Each unique set is formed by corresponding columns of the memory arrays that are presented in parallel at data I/O lanes 18.
Thus, the size of the data I/O path typically dictates the alignment at which data can be accessed. More specifically, the alignment of data access is fixed by the CAS boundaries; the addressing scheme divides the storage units into discrete, mutually exclusive groups corresponding to different CAS addresses, and access of any individual storage unit requires accessing the entire group to which the storage unit belongs. For example, storage unit C2 can only be retrieved in a group that contains storage units A2, B2, C2, and D2,.
In some cases, it is desired to access a relatively small number of storage units that span multiple groups. Even though the number of desired storage units might be less than the number of storage units within any given group, it is necessary to perform two or more CAS operations if the desired storage units span two or more groups.
In FIG. 2, for example, suppose it is desired to access storage units D0 and A1. Because these two storage units fall under different CAS addresses, two CAS operations are required to access the two storage units. A first CAS operation accesses storage units A0, B0, C0, and D0, and a second CAS operation accesses storage units A1, B1, C1, and D1.
This has not been a significant limitation in the past, because the width of the data I/O path has been relatively limited, and most I/O accesses span several CAS addresses. However, current speed requirements are resulting in memory devices having relatively wide data paths, such as 16 bytes or wider. When the data path becomes this wide, many data accesses involve a number of contiguous storage units that is smaller than width of the data path. Furthermore, the nature of some data storage applications makes it difficult to ensure that memory accesses will be aligned at CAS boundaries. Memory accesses tend to be less efficient in applications such as this.
A computer graphics subsystem is an example of an application that might utilize small transfers at an alignment that does not necessarily correspond to CAS boundaries within a memory device. Computer graphics systems typically use DRAM memory to store pixel information. Such pixel information might include color component intensities, Z buffer data, texture information, video data, and other information related to an array of displayed pixels.
Computer graphics systems typically include a graphics controller that interacts with one or more DRAM devices. Access speed is very important in graphics subsystems, and a variety of techniques might be employed to optimize the efficiency of memory access cycles.
One such optimization technique is referred to as xe2x80x9ctiling,xe2x80x9d in which rectangular tiles of graphics pixels are represented by portions of memory that can be accessed during a single CAS cycle. For example, in a system allowing data transfers of 16 bytes during each CAS operation, each graphics tile might be defined as a four-by-four square, represented by 16 bytes of data. Within the memory controller, memory is mapped in such a way that each four-by-four square is represented by 16 bytes that can be read or written in a single CAS cycle. In other words, the tiles are aligned at CAS boundaries.
FIG. 3 illustrates an example of tiling where each tile is defined as a four-by-four square of 16 pixels, and represented within DRAM memory by 16 bytes of data. The layout of storage units in FIG. 3 indicates their mapping to physical pixel locationsxe2x80x94the storage units represent a two-dimensional array of pixels corresponding to the two-dimensional arrangement shown in FIG. 3. The storage units shown in FIG. 3 correspond to a row or page of data, in a DRAM whose rows or pages each include 128 bytes of data that can be accessed in mutually exclusive groups of 16 bytes per CAS operation. FIG. 3 shows the CAS addresses corresponding to individual tiles, ranging from 0 to 7. This arrangement allows eight tiles per row or page of DRAM memory.
Tiling works well because access to graphics data tends to be localized in two dimensions; a two-dimensional graphical object can often be efficiently accessed through one or more rectangular tiles such as illustrated in FIG. 3. Increasing DRAM bandwidths, however, threaten to actually decrease the efficiency with which such data can be accessed. This is because larger data paths result in larger tiles. In some cases, the size of the tile is larger than the actual graphical objects that need to be accessed. In other cases, the size of the tile is comparable in size to that of graphical objects in the system, but those objects are positioned across several tiles. In other words, the objects are not aligned at CAS boundaries. Thus, it might be necessary to access two or more tiles of data in situations where much less data is actually needed by the graphics processor.
FIG. 3 illustrates a situation such as this, in which a graphics processor requires access to a small triangular object that is represented by the hatched storage units shown in FIG. 3. Although the triangle has only six pixels, they are spread across three tiles. To process this object requires three CAS operations. Although the three CAS operations access forty-eight bytes, forty-two will not be used. This represents a significant inefficiency.
Other DRAM operations suffer from similar inefficiencies. Routers, for example, utilize DRAM memory to store data packets. Each packet, which typically includes a small header and a larger payload, is optimally stored in a region of DRAM memory that can be accessed in a single CAS operation. In other words, packets are aligned at CAS boundaries. During much of its operation, however, the router needs access only to the header, because the header contains the information needed by the router to determine how to handle the packet as a whole. Even though only the relatively small header is needed, the organization of DRAM memory typically requires retrieval of the entire packet in order to read the header information. In order to retrieve multiple headers, multiple CAS operations must be performed.