Subscriber management routers (SMR) are Internet Protocol (IP) routers that provide per-subscriber services, such as Network Address Translation (NAT) and firewall services. These services are sometimes referred to as “high touch”, because they require that the router examine and manipulate many fields in the IP and higher layer headers. These services are also stateful, and require that the SMR maintain states about packet flows of individual subscribers. Such stateful “high touch” services are process intensive, and cannot be done at line speeds in hardware. As a result, typical SMR architectures consist of line cards and data cards. The primary function of a line card is to dispatch packets received on a given interface to a data card that has the appropriate flow state, and is therefore capable of processing the packet. Upon receiving a packet, the data card first identifies the subscriber, then identifies the individual flow, then processes the packet accordingly.
Because the line card does a limited amount of work, it can operate in hardware at line speeds. The software-based data cards are more numerous, and so individually do not need to operate at line speeds. If the line card is able to perform additional functions, then it may further offload the data card. One such function is identification of the subscriber associated with a given packet.
In order to identify the subscriber, the line card extracts certain fields from the incoming packet and generates a key that is unique for every subscriber. This key is then used to search memory for the entry that identifies the subscriber and the appropriate data card.
The line cards of large-capacity SMRs in mobile wireless environments have new and difficult requirements placed on them. As already mentioned, they must operate at line speeds, which requires a hardware-based implementation with an ability to search memory in a fixed amount of time. Subscriber entries are added and deleted frequently, as mobile subscribers come and go. The addition and deletion of entries must therefore operate quickly. In addition, each line card may handle a very large number of subscribers, on the order of one million.
The unique search key generated from the packet fields must be at least 64 bits in length. One reason is that the identifier field of the IPv6 header is 64 bits. Clearly, a 64 bit search key cannot be used as a direct index into Random Access Memory (RAM), as this would require 264 RAM entries (or, 18 giga-giga-entries).
One solution is to use a Content Addressable Memory (CAM) based search engine. A CAM based search engine operates quickly in deterministic time, and additions and deletions are simple and fast. However, a CAM is extremely expensive and takes up a lot of space on the line card, and so may not be a feasible solution.
Another solution utilizes a RAM based search engine with a hash table. In a standard hashing scheme, a search key K1 of a fixed length (L1) is presented to a universal hashing process. The search key is hashed using a universal hash function to generate a bucket ID having a second length, smaller than the first length L1. The bucket ID is used to address a primary hash table stored in a memory, and a data containing a key (of length L1) and two pointers (P1 and P2) are retrieved from an associated storage location.
The key retrieved from the hash table location data is compared with the search key, if a match is found, the pointer P1 points to a table containing entries for the search key K1.
If a match is not found, then P2 is used as an index to read another entry from the linked hash table; to retrieve a data containing a key (of length L1) and two pointers (P1 and P2) are retrieved from an associated storage location. The same process as described in this step is repeated until a match is found.
In a Dynamic Random Access Memory (DRAM) based solution it is good to have a burst read operation; because a read is usually associated with latency. If we do only single reads per access, the latency would consume most of the bandwidth available. So the standard hash described above would not work well in a DRAM based memory solution.
The number of hash table entries are generally 4 or 8 times more than the number of entries need to be found. For example, if there are 1 million entries to be searched, then the typical hash table would contain about 4 Million to 8 million entries. Thus, a Static Random Access Memory (SRAM) based solution is prohibitively expensive.
U.S. Pat. No. 5,914,938 teaches a method whereby each bucket contains N locations instead of just one. Each location contains a key/pointer entry, so that a single burst memory read obtains N entries matching a given bucket ID rather than just one. However, it is possible for an overflow to occur—that is, more than N entries have a given bucket ID. To prevent overflows, U.S. Pat. No. 5,914,938 teaches that in the event of an overflow, different hash functions are tried until a “perfect” hash function is found that has no overflows. This is possible in the LAN switching environment for which U.S. Pat. No. 5,914,938 was designed, because the addition and deletion of new entries is relatively infrequent. In the large-scale SMR environment, however, additions and deletions are too frequent to rehash all entries.
U.S. Pat. No. 6,052,698 and U.S. Pat. No. 5,530,834 teach the use of caches to speed up the average search time. The cache is smaller and faster than the main memory. Entries are stored in the cache when they are used. When the cache is full, the least-recently used (LRU) entries are overwritten. U.S. Pat. No. 6,052,698 is designed to take advantage of the caches within processors such as the PENTIUM processor, which does not apply to hardware-based approaches. In addition, the U.S. Pat. No. 5,530,834 teaches that the RAM itself is the cache memory and main memory is slower.