Some data structures contain huge amounts of data. For example, large networks can have millions or billions of destinations and millions of intermediate nodes, and each node may be identified by its own network address.
Traffic in a network may be routed by looking up records in a routing table or structure. The widely-used Internet Protocol, version 4 (IPv4) uses 32-bit IP addresses and can support up to 232 IP nodes, or about 4 billion addresses. A newer version, IPv6, uses 128-bit IP addresses and can have 2128 IP-addressable nodes. Each record in a data structure might contain one IP address or a range of IP addresses. Routers, switches, and other network nodes may contain a subset of the available IP addresses.
A linear binary search may be used on multiple levels of lookup. Each bit of an input lookup key is used to decide between forks in paths through a tree-like table structure. Since multiple levels of search are required, search time is slow, although the storage space needed for the table is more efficient. A traditional Trie structure has a number of access levels equal to a number of bits in the input key. Each stride, or jump to the next level, consumes one bit of the input key.
A compromise data structure modifies the Trie structure to use strides of more than one bit. This structure can provide a good trade-off between access speed and storage requirements.
FIG. 1 shows prior-art stride tables in a multi-bit Trie structure. Key 18 is the lookup key that is input to the table structure. A lookup is an operation to find an entry in the table structure that matches key 18. Key 18 is divided into four strides S1, S2, S3, S3. In this simplified example, key 18 is only 8 bits wide, and each stride is 2 bits wide.
First stride S1 selects one of four entries in first-level stride table 10. Entries in table 10 contain pointers to tables 12 in the second level. For example, the top second-level table 12 is pointed to by the top entry in table 10, which is selected when S1 is 11. Another second-level table 12′ is pointed to by the third entry in table 10, which is selected when S1 is 01.
Since each stride is 2 bits, each entry in one level points to a table of 4 entries in the next level. Thus a single table 10 in level 1 expands to four tables 12 in the second level, sixteen tables 14 in the third level, and sixty-four tables 16 in the fourth level. A lookup is performed by traversing the four levels of the tables in the table structure. For the example of key 18 having a value of 01110011, the first stride S1 is 01 and selects the third entry in table 10, which points to table 12′ in level 2.
The two stride bits 11 for S2 select from among the four entries in each of tables 12. Since first-level stride table 10 pointed to table 12′, an entry from selected table 12′ is used and other tables 12 are ignored. The top entry in table 12′ is selected by the value (11) of S2. This top entry contains a pointer to selected table 14′ in level 3.
The two stride bits S3 of level three select from among the four entries in selected table 14′ in the third level. The value of S3 is 00, which selects the lowest entry in selected table 14′. This entry has a pointer to one of the 64 tables in level four, selected table 16′.
The value of the fourth stride S4, 11, selects the upper of four entries in selected stride table 16′. This entry contains the result of the lookup, or a pointer to the result. The value 01110011 of key 18 returns this result. Both the key and the result may be composed of several fields that are combined together.
When longest-prefix matches (LPM) are supported, intermediate results may be stored in entries in tables 10, 12, 14 of the intermediate levels, rather than only at the end (leaf) levels.
While such Trie structures modified for multi-bit strides are useful compromises between a fast but large single-level table, and a slow but storage-efficient Trie structure, the storage requirements may still be large using stride tables.
The parent application disclosed using compression functions to compress stride tables. Network tables tend to be sparse tables since the address locations are sparsely populated with valid entries. Most memory locations tend to be empty, or have invalid entries.
Since network tables are sparse, the valid entries in a stride table may be compressed to squeeze out invalid entries and reduce the storage requirements. Ideally, only the valid entries would be stored, but in practice some invalid entries are also stored in the compressed tables. Thus the degree of compression may be less than ideal.
Rather than simply masking or deleting index bits to compress a table, index bits may be compressed using various functions. For example, two or more index bits may be combined together into a single index bit using a logical function such as a logical XOR. A variety of compression functions may be used such as XOR, AND, OR, rotate, shifts, and conditional operations. A field in the table entry can indicate which logical functions to perform for the next level of stride tables. Thus the different logical functions may be mixed together within a table structure to optimize compression.
FIG. 2 shows a logically-compressed stride table as described in more detail in the parent application. Table 20 is a stride table such as one of tables 10, 12, 14, 16 of FIG. 1. In FIG. 2, the current stride size is 4 bits. The current stride of the input key is used as a 4-bit index address that selects one of the 16 entries in stride table 20. Each entry in stride table 20 is in a location identified by a 4-bit index of bits A3, A2, A1, A0. Stride table 20 contains only 4 valid entries, at index locations 1100, 1011, 1001, and 1000. The other 12 indexes contain invalid or empty entries.
Since all four valid entries have a 1 in the most-significant-bit (MSB) of the index, or A3=1, index bit A3 is not needed to select among the four valid entries. Index bit A3 could be removed or masked from stride table 20, as was shown in prior-art FIG. 2. The entries with A3=0 are deleted, since they are all empty entries. Only the entries with A3=1 are retained in compressed stride table 20′.
Further compression can be achieved by combining two of the remaining three index bits to create a new index bit. In this example, index bits A2 and A1 are combined by an XOR function to generate a new A1 index bit. The old A2 and A1 index bits are replaced by this new A1 index bit. The result is that a 2-bit index is used in compressed stride table 22 to select among the 4 entries in table 22.
The size of stride table 22 has been reduced by 75% by masking one index bit (A3) and logically combining two other index bits (A2, A1). Empty entries are removed from compressed stride table 22. Other stride tables could also be compressed, reducing the overall storage requirements.
Two steps were performed to compress the 4 index bits down to 2 index bits. First, one of the index bits (A3) was masked out. Second, two index bits (A2, A1) were logically combined into one index bit (the new A1). The XOR function used for this logical combining is a compression function (CF). Other compression functions may be used for other tables, such as AND, OR, rotate, shift, or more complex functions such as AND-OR, etc.
FIG. 3 shows a generalized logical compressor with initial and final masking as described in the parent application. In this generalization of the compression function, two or more of the input bits selected by the initial mask are logically combined to create new candidate bits. A final selection is made from the full set of bits from the initial selection and from the newly created candidate bits.
The XOR of FIG. 2 created a new index bit (new A1) from two of the uncompressed index bits (A2, A1). This new index bit, the output of the XOR, was selected while the two original index bits were dropped. The operation of logical compressor 26 can be thought of as initially creating new candidate index bits (merged bits) by performing logical operations such as XOR, and then selecting from among the merged and original index bits to create the final compressed index.
It is expected that the combination of initial bit selection and final bit selection may lead to better candidates for index bits. Overall, fewer of the newly created index bits may need to be selected than if only the initial bits were available.
Uncompressed index bits are masked by an initial mask applied to initial masker 24. Some of the remaining bits from initial masker 24 bypass logical merger 28 and are input directly to final masker 30, while other bits from initial masker 24 are input to logical merger 28.
Logical merger 28 combines selected index bits from initial masker 24 using logical functions to produce merged bits E. For example, adjacent index bits may be combined by an XOR function, and the XOR results are the merged bits. The XOR results may be more efficient at encoding the valid entries than the original index bits, as was true of the XOR of A2, A1 in FIG. 2.
Final masker 30 receives both the original index bits selected by initial masker 24, and the merged bits created by logical merger 28. Final masker 30 selects from among the original and merged bits to output the final compressed index bits.
The compression function CF, or another control field, can indicate that merged bits are to be created, and which of the S-M unmasked index bits are combined together by logical merger 28. The final mask field can indicate which of the S-M index bits and E merged bits are output as the final compressed index.
When E merged bits are created by logical merger 28, and initial masker 24 removes M index bits from an original S index bits, and final masker 30 removes another F index bits, the number of final compressed index bits is S−M+E−F. These S−M+E−F bits select one entry in compressed stride table 22. The number of entries in compressed stride table 22 is 2S−M+E−F, which has 2M+F−E entries fewer than uncompressed stride table 20.
Some of the entries in compressed stride table 22 may be invalid entries, since the number of valid entries may not be a power of 2. Compression may be less than ideal, even when a variety of compression functions are available. However, significantly better compression can be achieved than with the simple bit masking of FIG. 2.
FIG. 4 shows details of a logical merger that creates merged index bits. Uncompressed index bits are optionally masked by an initial mask (not shown in this embodiment). Some of the uncompressed index bits may bypass logical merger 28 and become bits in the compressed index, while other uncompressed index bits are input to logical merger 28 for further compression.
The uncompressed index bits are applied as inputs to logical gates 48, 48′. These may be discrete XOR logical gates, or they may be implemented in firmware or software or in an arithmetic-logic-unit (ALU) or similar programmable device. The compression function CF applied to logical compressor 26 determines which of logical gates 48, 48′, 49 are selected, while the others are disabled.
Some stride tables may compress better with two-level XOR'ing, while other stride tables compress well using 2-input or 4-input XOR'ing. When the stride tables are being constructed or new entries are being added that cause a stride table to be expanded, software can test the various CF functions and choose an optimal function from the many available functions. Routines that find the best CF functions and which index bits to compress can be written that either intelligently find the optimal or near-optimal choices, or that try all combinations and then select the one with the best results.
More examples of compression functions that can be used with the invention of the parent application are desired. Compression functions that are optimized for a variety of data sets and types are desirable.