During the operation of a computer system, programs executing on the system access memory in the computer system to store data generated by the program and retrieve data being processed by the program. To access data stored in memory, a memory controller generates the appropriate signals to access the desired data stored in memory. For example, data is typically physically stored in memory in an array of rows and columns of memory storage locations, each memory location having a corresponding address. To access data stored in a particular location, the memory controller must apply a read or write command to the memory along with the address of the desired data. In response to the command and address from the controller, the memory accesses the corresponding storage location and either writes data to or reads data from that location.
Depending on the type of data being stored and processed, the accessing of the required data may be relatively complicated and thus inefficient. This is true because programs executing on the computer system must store and retrieve data for various types of more complicated data structures, such as vectors and arrays. A two dimensional array, for example, consists of a plurality of data elements arranged in rows and columns. To store the data elements of the array in memory, the memory controller simply stores these elements one after another in consecutive storage locations in the memory. While the data elements are stored in this manner, operations performed on the individual elements of the array many times necessitate that elements stored in nonconsecutive memory locations be accessed.
An example of the storage and access issues presented by a two-dimensional matrix stored in memory will now be described in more detail with reference to FIG. 1. FIG. 1 shows on the left a 10×8 matrix 100 consisting of 10 rows and 8 columns of data elements DE11-DE108, each data element being represented as a circle. In the following description, note that the data elements DE11-DE108 may be referred to generally as DE when not referring to a specific one or ones of the elements, while the subscripts will be included only when referring to a specific one or ones of the elements. The data elements DE of the matrix 100 are stored in the storage locations of a memory 102, as indicated by arrow 104. The data elements DE11-DE108 are stored in consecutive storage locations with a given row of storage locations in the memory 102. In the example of FIG. 1 the row in memory 102 is designated as having an address 0 and the data elements DE11-DE108 are stored in consecutive columns within the row, with the columns being designated 0-4F hexadecimal. Thus, the data element DE11 is stored in storage location having row address 0 and column address 0, data element DE21 is stored in row address 0 and column address 1, and so on. In FIG. 1, the storage locations in the memory 102 having row address 0 and column addresses 00-4F containing the data elements DE11-DE108 are shown in four separate columns merely for ease of illustration.
For the matrix 100, the first column of data elements DE11-DE101 and second column of data elements DE12-DE102 are stored in storage locations 0-13 in the memory 102, which are shown in the first column of storage locations. The data elements DE13-DE103 and DE14-DE104 in the third and fourth columns of the matrix 100 are stored in storage locations 14-27, respectively, in the memory 102. Finally, the data elements DE15-DE105 and DE16-DE106 are stored in storage locations 28-3B and data elements DE17-DE107 and DE18-DE108 are stored in storage locations 3C-4F.
When accessing the stored data elements DE, common mathematical manipulations of these elements may result in relatively complicated memory accesses or “memory behaviors”. For example, the data elements DE contained in respective rows of the matrix 100 may correspond to vectors being processed by a program executing on a computer system (not shown) containing the memory 102. In this situation, the data elements DE of a desired row in the matrix 100 must be accessed to retrieve the desired vector. From the above description of the storage of the data elements DE in the memory 102, the retrieval of desired data elements in this situation is seen as requiring data elements stored in nonconsecutive storage locations to be accessed. For example, if the third row of data elements DE31-DE38 is to be retrieved, the data element DE31 stored in location 2 in the memory 102 must be accessed, then the data element DE32 stored in location C, and so on. The data elements DE31 and DE32 are illustrated in the storage locations 2 and C within the memory 102.
A stride value S, which equals 10 in the example of FIG. 1, corresponds to the difference between addresses of consecutive data elements being accessed. As seen in the example for the vector corresponding to row 3 in the matrix 100, the stride value S between consecutive data elements DE31 and DE32 equals 10, as is true for each pair of consecutive data elements in this example. Such a stride value S can be utilized to generate addresses for the desired data elements DE in this and other memory behaviors requiring nonsequential access of storage locations. For example, to generate addresses to access all data elements DE in row 3 of the matrix 100, all that is required is a base address corresponding to the address of the first data element (DE31 in this example), stride value S, and a total number N of times (7 in this example) to add the stride value to the immediately prior address. Using these parameters, each address equals a base address (BA) plus n times the stride value S where n varies from 0 to N (address=BA+n×S) for n=0-7).
Many different types of memory behaviors which involve the nonsequential access of storage locations are common and complicate the retrieval of the desired data elements DE in the memory 102. Examples of different types of memory behaviors that include the such nonsequential accessing of data elements include accessing simple and complex vectors, simple indexed arrays, sliced arrays, masked arrays, sliced and masked arrays, vectors and arrays of user defined data structures, and sliced and masked arrays of user defined structures. For example, a mask array is commonly utilized to extract the desired data elements DE while leaving the other data elements in the alone. If it was desired to extract just one data element DE contained in the same position in a number of different matrices 100 stored in the memory 102, and the element was in the same position for each matrix, then a mask array is generated that would effectively block out all of the data elements of each matrix except the data element that is desired. This mask array is then converted into read instructions that are applied to the memory 102 so that only the unmasked data element DE in each matrix is retrieved.
While a formula analogous to that developed above for the vector example can be developed for these types of memory behaviors, for a number of reasons these types of memory behaviors or can adversely affect the operation of the memory 102, as will be appreciated by those skilled in the art. Typically, such complicated memory behaviors are handled in software, which slows the access of the desired data elements DE. The programming language C++, for example, has a valarray data structure that will take a mask and then generate the proper memory addresses to apply to memory 102 to retrieve the desired data elements DE. The translation and processing of the defined mask to generate the required addresses to access the corresponding data elements DE in memory 102 is done in software. Once the mask is converted into addresses, these addresses are applied to the memory 102, typically via a memory controller (not shown), to retrieve the desired data elements.
One drawback to this approach is that the translation of the mask array into corresponding addresses is performed in software. The software translates elements in the mask array into corresponding physical addresses that are then applied to the memory 102. While performing these translations in software provides flexibility, the execution of the required programming instructions to perform the conversions is not trivial and thus may take a relatively long time. For example, even where the mask array only includes values such that only one data element DE is to be selected from the data elements of the matrix 100, the software translation algorithm still has to go through and determine the address of that single unmasked data element. The time required to perform such translations, particularly where a large number of accesses to arrays stored in memory 102 are involved, may certainly be long enough to slow down the overall operation of the computer system containing the memory.
Existing memory controllers may include circuitry that allows segmenting and striding of memory to improve performance by implementing some of the functionality for generating nonsequential addresses in the controller instead of in software. Segmentation of memory divides memory into a number of segments or partitions, such as dividing a 256 megabyte static random access memory (SRAM) into 256 one-megabyte partitions. Partitioning the memory allows instructions applied to the controller to include smaller addresses, with a controller state machine altering the addresses by adding an offset to access the proper address. The offset is determined based upon a segment address provided to the controller. Striding involves the nonsequential generation of addresses separated by a defined value defined as the stride value S, as previously discussed. While some controllers may include circuitry to stride through memory, in such controllers the stride value S is set prior to operation of the associated memory and typically cannot be changed while a program is executing on the computer system containing the memory controller and memory. Moreover, in such memory controllers the stride value S is typically limited to a constant value.
Although existing memory controllers may provide segmentation and striding functionality, this functionality is limited and not easily changed. Moreover, this functionality does not enable many more complicated memory behaviors to be implemented in hardware, meaning such behaviors must be done through software with the attendant decrease in performance. There is a need for a system and method for implementing complex memory behaviors in hardware to allow for high-speed access of memory.