Numerous examples of serial memory addressing exist including U.S. Pat. No. 4,667,313, granted May 19, 1984 to Pinkham et. al.; U.S. Pat. No. 5,452,255, granted Sep. 9, 1995 to Mine et al.; and U.S. Pat. No. 6,639,867, granted Oct. 28, 2003, but they all serially load a register with the address, and then use various combinations of the shift register data and traditional decode logic to select the specific word, as opposed to directly serially decoding the address.
Serial loading of addresses, as well as serial access of successive memory locations is becoming more important as the asynchronous high bandwidth nature of communications between chips and the cost of integrated circuit I/O coupled with the inherently slower on-chip clocking is resulting in a shift away from parallel synchronous input to high-speed Serialize/De-serialize (SERDES) input. Currently the external serial input is shifted into a register, which is driven in parallel into the chip when the last bit is captured. If the data is an address, it is decoded in parallel in order to access the memory. As a result there is the delay of the parallel decode and word access after the last external bit is available when accessing an on chip memory.
Alternatively, in high-speed systems, the address is broken into high- and low-order bits. Presumably the high-order bits are loaded first, and then the low-order bits are loaded. These low-order bits are powered up and drive successive levels of multiplexers to make the data available on the output shortly after the low-order bits are loaded into their register.
Reference is now made to FIG. 1. Many communications chips have high-speed serial interfaces, and some of the data being serially sent to such chips consists of address data. Typically this data is first converted from serial to parallel form in a SERDES 10. The address data is then transferred to the address registers 13,14 of an on-chip memory. In high performance memory designs, the high-order bits of the address data are loaded into a register 13 and decoded through a decoder 12, to drive one of the word lines in the memory core 11. The resulting data from that word line is then selected by loading the lower-order bits of the address data into a separate address register 14, which drives a series of multiplexers 15. By loading the high-order bits first, the data can be available on the inputs of the multiplexers 15 when the low-order address bits are loaded in their address register 14, limiting the access time to the delay from the address register 14 through the multiplexers 15. In simpler memories, both high- and low-order bits are loaded at the same time. In that case, the access time is the propagation time through the decode 12, memory core 11 and multiplexers 15. The translation from serial to parallel operation is required because this propagation time is typically much longer than a single serial clock cycle.
Using serial address data to access memories is a typical operation in high-speed serial communications routers and switches, which must be able to translate a large number of addresses into a limited number of port addresses or to indicate the address is not translated to a port in this network. Content-Addressable Memories (CAMs) have occasionally been used to do this type of operation. CAMs have the advantage of completing the search in one operation, but they require both a lot of time per access and a lot of transistors per memory element because they compare the inputs with all the words in their memories on each operation. An alternative is the hardware equivalent of a software technique called hashing.
Hashing searches a limited number of items that are stored in a small linear array when the address space for those items is much larger than the array itself by accessing the array with an index address that is both within the range of the array and is an arithmetic function of the item's full address. This implies that many items with different full addresses could map to the same index address. The problem then becomes storing the items efficiently enough in the array to maximize the utilization of the array while finding any given entry in the fewest number of memory accesses. Existing software hashing techniques suggest hashing tables contain a prime number of locations and the index for addressing into the hashing tables maps the large address space evenly into each of the locations. Since there could be multiple items in the table with the same index, only one of which is stored at the index address, it is also necessary to step through the memory in some fashion if the first item accessed is not the correct one. Many hashing algorithms suggest incrementing through the hashing table with another prime that is smaller than the size of the table. Ring theory shows that this will ultimately traverse through all elements in the table without repetition. One way to create an initial index and increment may be:
Index = Mod(original address, table size), andIncrement = some prime < table size
Now to determine the existence of an item in the table, perform the following:
Do until item = Table(Index) or Table(Index) = null Index = Index + IncrementLoop
If the Table(Index) is null, then the item doesn't exist in the table; otherwise it does. In many applications, if it doesn't exist, it is added to the table. Clearly, the table begins with all null values and becomes more inefficient in determining if an item is in the table as the number of entries grows, since clashes exist with multiple items that map to the same hashing index.
Traditionally the Increment is a constant 1, which makes the next index easy to create, but often the data put into a hash table clusters about a few numbers, which makes the simple increment inefficient. Two solutions have been proposed in the past to eliminate this problem: either create an index that sufficiently separates otherwise adjacent entries, or create an increment that is both prime and varies with the entry in a way that is sufficiently different from the index to cause entries with the same initial index to have different increments.
The difficulty with this approach, when building hardware, is the need for a multiply or divide and the use of prime numbers, which are not easily addressable. Patents such as U.S. Pat. No. 6,804,768, granted Oct. 12, 2004 to Moyer, and U.S. Pat. No. 6,785,278, granted Aug. 31, 2004 to Calvignau, et al., describe ways to create indexes without doing multiplies or divides, but not the increment function.
The best solution would be to traverse a space that is N binary address bits, or 2N elements in size, ideally in a non-incremental fashion, and to do so without addition or multiplication. The problem then is to find a technique that is certain of traversing the entire size 2N space without stepping through successive addresses.