One approach to implementing a high-performance memory system is to use interleaved banks in the memory. Such a memory is built out of smaller memories referred to as banks, each of which handles part of the data being processed by the system. The addresses of locations in each bank are not sequential within a bank, but, rather, are sequential across the multiple banks. That is, for example, if the memory has four banks, then bank 0 could handle location addresses 0, 4, 8, . . . , in sequence; bank 1 could handle addresses 1, 5, 9, . . . ; bank 2 could handle addresses 2, 6, 10, . . . ; and bank 3 could handle addresses 3, 7, 11, . . . Hence, if instructions that are being executed access the memory addresses in sequence, as is often the case, the instructions can access different banks of the memory and can, therefore, be executed simultaneously.
In many cases, the addresses of accessed data are independent. For example, this is the case when a table look-up for the result of some function such as a complex mathematical function is being performed. In this case, the address is dependent upon the input data and may have no correlation to other addresses, e.g., the addresses are not in sequence. If the addresses are applied to an interleaved memory at once, they may spread across all the banks, or they may all hit on one bank. This situation results in reduced performance because of the simultaneous accesses. It also makes the time to process a table look-up indeterminate. This makes it difficult for the system to meet real-time performance time constraints.
One prior approach to dealing with these problems is for each processor to have an individual small private memory for holding tables. In this approach, because there is only one unit accessing the memory, it always takes the same time to handle table look-ups. However, this approach can be inefficient because, in general, the private memories are either too small or too big. They are too small if programmers want to store more tables than will fit, and they are too big if tables are not being used. In either case, the use of these individual memories can be inefficient.
The present invention is directed to an apparatus and method for accessing data in a processing system. A plurality of processing elements are provided for executing program instructions. Each processing element is associated with an identifying value. A memory stores data accessed by the processing elements while executing instructions. An instruction is received for execution. The instruction requires access to the memory and includes information associated with an address in the memory. For example, the information can be an address word containing the base or start address of a table stored in the memory. Each of the processing elements executing the instruction generates an address for accessing the memory by combining at least a portion of the information associated with an address in the memory with at least a portion of the identifying value associated with the processing element. The processing element then accesses the memory using the generated address.
In one embodiment, the memory is partitioned into a plurality of banks and can be interleaved. The instruction can require a look-up of data stored in a table. The table can be stored at a plurality of locations of the memory. Multiple copies of the table can be stored in respective multiple banks of the memory. Accordingly, multiple accesses to the same data item can be made to the multiple banks simultaneously by multiple processing elements.
In one embodiment, the processing system is a single instruction stream, multiple data stream (SIMD) processing system. Accordingly, instructions can be executed along multiple data paths simultaneously by the system. In this embodiment, each processing element is a data path in the SIMD system.
The address used by the processing element, i.e., data path, to access the memory is generated from the information associated with an address, e.g., the start address of the table, and the identifying value associated with the processing element. The identifying value uniquely identifies the processing element and therefore distinguishes it from the other processing elements. Therefore, adding it to the address used to access the memory ensures that the number of processing elements accessing the relevant memory address at one time can be controlled. In one embodiment, the identifying value is a multi-bit word. Only a predetermined number of the least significant bits of the word are added to the generated address and, therefore, control the number of processing elements that can access the memory location at a time. In one embodiment, the number of banks containing copies of the table being accessed is used to determine the number of bits. Specifically, in one embodiment, the number n of bits used to generate the memory address is defined by N=2n, where N is the number of banks and/or the number of copies of the table stored in the banks. Therefore, the number of bits used in the address is the number of bits required to uniquely select one of the banks.
For example, if there are four copies of the table in the banks and four processing elements (data paths), two bits of the data path identifying word are used. The two bits are used to select one of the four banks and/or table copies for each of the four data paths. As a result, a one-to-one correspondence between the data paths and the table copies in the banks is developed, such that no simultaneous accesses by multiple processing elements to the same memory location can occur. As another example, if there are four banks (table copies) and eight data paths, the two least significant bits of the identifying value are used. As a result, four pairs of data paths will access each bank (table copy). Hence, only two simultaneous accesses can occur at each memory location. The accesses will be delayed slightly, but much less than they would be delayed in an uncontrolled setting where all eight data paths could make eight simultaneous accesses to the same location.
The approach of the invention provides numerous advantages over approaches of the prior art. For example, the invention spreads table look-up memory accesses over multiple memory banks in an interleaved memory in a controlled and predictable fashion. As a result, the number of simultaneous accesses can be limited. In addition to limiting simultaneous accesses, even where simultaneous accesses do occur, the maximum possible number of them is limited and known. Therefore, the time required for memory accesses is predictable, thus facilitating meeting real-time scheduling constraints.