Network switches inspect fields in the headers of Ethernet packets in order to determine what action to perform on the packet. Actions may include sending the packet to a specific output port and a specific queue for that port, multicasting or broadcasting the packet to several ports, sending the packet to a network controller so it can determine the packet should be sent, or dropping the packet by not sending it to any destination. Packets have headers representing several levels of the 7 layer OSI model defining packet transport. Typically, these headers follow one another in increasing layer number of the OSI model. As an example, a packet may begin with a Medium Access Control (MAC) header at level 2, followed by an Internet Protocol (IP) header at level 3, followed by a Transmission Control Protocol (TCP) header at level 4. Each of these headers internally has a number of individual fields which may be inspected by the switch. At the level 2 MAC layer, the MAC header contains MAC source and destination addresses, along with an ethertype. An IP header at layer 3 contains IP source and destinations and an IP protocol, among other fields. A TCP header at level 4 contains TCP source and destination port numbers, among other fields. All these fields may be inspected by the switch, and may be modified in the course of determining the disposition of the packet.
Switches usually function by associatively matching packet fields against internal switch tables. For example, a Level 2 switch may contain a table of MAC destination addresses. Each incoming packet has its MAC destination address extracted, and then the MAC destination table is searched to determine whether it contains the packet's MAC destination address. If it does, an action is retrieved from data associated with the matching table entry, which specifies the output destination port for the packet. This match/action paradigm is formalized by the OpenFlow standard, but both OpenFlow and conventional (non OpenFlow) switches perform the same functions of matching packet header fields against internal tables, and performing actions specified by data associated with the matching entry.
One common feature of a switch is an Access Control List (ACL). An ACL table implements the important function of providing a single bit output, specifying what is commonly referred to as permit vs. deny, that is, specifying whether the packet is allowed to proceed (permit) or whether the packet is dropped immediately (deny). ACL tables represent a switch's and the entire Internet's first line of defense against a wide variety of attacks.
A typical ACL table does not just match against a single packet header field; several packet header fields are grouped together and presented to the ACL table, whose entries have values for the aggregate data word representing the concatenation of all desired fields. A common configuration for an ACL table is to match against the so-called TCP 5-tuple: IP source and destination addresses (32b each for IPV4), IP protocol (8 bits), and TCP source and destination port numbers (16b each), a total of 104 bits.
ACL tables, along with some of the other tables in switches, allow ternary matches, where for a table entry, each bit may be 1, 0 or don't care (also called a wildcard bit). If the table entry bit is a 1 or 0, the incoming packet must contain that 1 or 0 in order for the table entry to match, whereas if the table entry is a don't care, the corresponding packet header bit may be either 1 or 0, with the bit effectively not participating in the match requirement.
When tables do not contain any wildcard bits, they are referred to as exact match tables. These have efficient hardware implementations where the dominant cost of a table entry is storing its match value in a memory such as SRAM. These may be organized and accessed as hashtables. Cuckoo hashtables are specific types of hashtables which prevent a hashtable collision problem by using a hash fill algorithm providing high occupancy. Most of the entries provided by the hashtable SRAM will be able to be filled with match entries. For example, a 4 way Cuckoo hashtable can be filled to at least 95% occupancy.
When tables contain wildcard bits, they are more expensive to implement, and at the performance levels of typical hardware switches are usually implemented using TCAMs (Ternary Content Addressable Memories). In addition to storing a ternary value at each table bit location, TCAMs include matching logic to determine whether each TCAM entry matches against a search word provided as an input to the TCAM. TCAMs are more expensive to implement than SRAMs, where expense is measured as the amount of Silicon area required to implement a bit of a single match entry. This is because a TCAM actually must contain two SRAM bit cells to store the two bits required to represent the three values of a ternary entry. Furthermore, there is logic attached to each bit-cell to determine whether it matches (in a ternary sense) its bit of the search word, and to combine all individual bit matches into a single output representing whether that TCAM entry matched or not.
A TCAM table may be 6-7× more expensive in area than an equivalent exact match table. It also dissipates substantially more power. As a result, the amount of TCAM table provided on switches is typically much smaller than the amount of SRAM-based exact match table. TCAM tables on switches are considered a scarce and precious resource.
Some type of tables allowing wildcard bits, called longest prefix match (LPM) tables, have table entries with prefix coding, requiring all 1 or 0 entries on most significant bits (msbs), with least significant bits least significant bits (lsbs) being wildcarded. Table entries consist of a string of 1/0's starting from the msb optionally followed by a number of don't cares. No 1/0 constants can appear in less significant bit positions than wildcard entries. This special type of table is useful for IP address matching, and sometimes is built with special purpose hardware, other than TCAM. However, this technique is generally not applicable to ACL tables. Special purpose LPM tables generally match against a single prefix coded field (like a 32 bit IP address), whereas an ACL table contains a number of fields.
ACL tables are usually ternary tables and therefore use expensive TCAM rather than cheap SRAM. The number of provided ACL entries is therefore quite limited since they cannot use the LPM table architecture.
An ACL entry often doesn't just specify ternary values for entries, it specifies a range for some of its constituent fields. A common scenario is that the TCP source and destination port numbers may be ranges. For example, a range for a (16 bit) TCP port may be from 1024 to 65535. A logic decomposition of this range into ternary entries is as follows, where (-) indicates a don't care:
range from 0x0400 to 0xffffbit11111154321098765432101--- ---- ---- ----;0x8000 to 0xffff01-- ---- ---- ----;0x4000 to 0x7fff001- ---- ---- ----;0x2000 to 0x3fff0001 ---- ---- ----;0x1000 to 0x1fff0000 1--- ---- ----;0x0800 to 0x8fff0000 01-- ---- ----;0x0400 to 0x04ff
Above, the contribution of each line in covering the range from 0x0400 to 0xffff (a 0x header means a hex, base 16 number) is indicated to its right. The arrangement where all the don't-cares are in the lsbs is commonly called prefix coding. All lines but the first can be changed so the leading 0's become don't-cares; the added portion of the Boolean space in each entry is actually covered by earlier entries. While individual TCAM implementations may prefer that these bits set one way or the other to minimize power, here they will generally be made don't cares for clarity of reading:
1--- ---- ---- ----;0x8000 to 0xffff-1-- ---- ---- ----;0x4000 to 0x7fff--1- ---- ---- ----;0x2000 to 0x3fff---1 ---- ---- ----;0x1000 to 0x1fff---- 1--- ---- ----;0x0800 to 0x8fff---- -1-- ---- ----;0x0400 to 0x04ff
The number of ternary words required to represent a range varies according to the range. For example, the range 1 to 65535 requires 16 entries:
1--- ---- ---- -----1-- ---- ---- ------1- ---- ---- -------1 ---- ---- -------- 1--- ---- -------- -1-- ---- -------- --1- ---- -------- ---1 ---- -------- ---- 1--- -------- ---- -1-- -------- ---- --1- -------- ---- ---1 -------- ---- ---- 1------- ---- ---- -1------ ---- ---- --1----- ---- ---- ---1
The range 1 to 65534, permitting all but the lowest (0) and highest (65535) values, requires 30 entries. This is the worst case:
1--- ---- ---- ---01--- ---- ---- --0-1--- ---- ---- -0--1--- ---- ---- 0---1--- ---- ---0 ----1--- -------0- ----1--- ------0-- ----1--- ---- 0--- ----1--- ---0 ---- ----1--- --0- ---- ----1--- -0-- ---- ----1--- 0--- ---- ----1--0 ---- ---- ----1-0- ---- ---- ----10-- ---- ---- ----01-- ---- ---- ----0-1- ---- ---- ----0--1 ---- ---- ----0--- 1--- ---- ----0--- -1-- ---- ----0--- --1- ---- ----0--- ---1 ---- ----0--- ---- 1--- ----0--- ---- -1-- ----0--- ---- --1- ----0--- ---- ---1 ----0--- ---- ---- 1---0--- ---- ---- -1--0--- ---- ---- --1-0--- ---- ---- ---1
Equivalent functionality is achieved if all don't-cares between the first and second digits in each entry are replaced with the value of the first digit, for example 0--1-- . . . becomes 0001- . . . , making the entries conform to prefix coding, where all entries their wildcard bits at the end, after all 1,0 bits.
Each of the above entries represents a portion of the specified range, which (for convention) we will specify as outputting a permit rather than deny. However, TCAM entries are prioritized, so it is possible to intersperse permit and deny entries, simplifying some cases. For example, the range above, 1 to 65534 could be specified as:
0000000000000000deny;;deny 01111111111111111deny;;deny 65535----------------permit;;permit everything else
A first style, where all entries are marked with the same output logic polarity (permit), as internal coding, and labels the example above, allowing both permit and deny entries, as external coding. It also uses the term code length as the number of TCAM entries required to represent a range. The internal coding style used above results in a requirement of 2W−2 entries for a W bit wide word which also refers to other work reducing the requirement to 2W−5 entries, again using only internal coding.
Generally, internal coding schemes with better results than 2W−2 require a change in the number representation of the TCAM data and the incoming word being searched. Yet other schemes have been suggested where comparators are provided to identify and encode a group of “special” ranges, with the result that the system can behave more efficiently for those specific ranges, but those systems lack generality. In addition, some of these schemes add extra bits to the TCAM range representations.
In comparison to the external coding example above, which reduced 30 entries down to 3 for that fortuitous example, another example shows the general result that using external coding, code length can be reduced to W from the 2W−2 or 2W−5 entries of internal coding. It also shows that this bound of W entries is tight; it is not possible to produce a better worst case result than W over all ranges. This factor of two reduction in code length is of course desirable, but it will be seen that it is not always possible to use external coding in every circumstance.
In the above internal coding example of range 1 to 65534, requiring 30 entries, if two fields (like TCP source and destination port numbers) each required that range, the result would require 30×30 or 900 entries to implement using internal coding. In fact, it is a common situation that ACL entries include ranges on both TCP source and destination port numbers. With multiple range dimensions, each term representing one range has to be cross-producted with every term from every other range, when internal coding is used. In general, implementing multiple dimensions of range entry using internal coding requires the product of the number of terms to implement each range. If three ranges are required, the number of terms becomes the product of the code lengths of all three. For d dimensions, the number of entries is (2W−2)**d, again using internal coding. A more efficient implementation of multiple ranges using external coding can be used. For two ranges A and B in different fields, using the notation that /A=not (A), meaning not in the range A, we have equation (1)A*B=not(/A+/B).  Equation 1
For a range A, from a1 to a2, /A is the combination of the two intervals 0<=x<a1 and a2<x<2**W. Intervals from 0 to some bound, or from some bound to 2**W are called extremal ranges. From 0 to an upper bound is called a left extremal edge and from some bound to 2**W is called a right extremal edge. The pair of left and right extremal ranges together are called the complementary range of A. This complementary range can be implemented using internal coding with 2W−2 entries for any pair of extremal ranges constituting /A. Note that this is internal coding of the complementary range, where the action associated with matching the complementary range is deny.
A simple example will illustrate. The range 1 to 65534 has a complementary range with just two entries. If ranges A and B in different fields each have that range, the following entries are required in a table whose entries are wide enough for the two fields:
ABbitbit111111111111543210987654321054321098765432100000000000000000----------------deny ;;deny A = 01111111111111111----------------deny ;;deny A = 65535----------------0000000000000000deny ;;deny B = 0----------------1111111111111111deny ;;deny B = 65535--------------------------------permit ;;permit everything else
Note this was a simple case. In the worst case, 2W−2 entries are required for each range dimension. Note these entries are marked deny. Finally, one entry with fully wildcarded ranges is marked accept to represent the space not covered by its negative in the equation above. So the total number of entries for d dimensions of width W ismultirange code length=2d(W−1)+1
For the common situation of a 16b range of a TCP port number, a maximum of 30 TCAM entries may be required to represent a complementary range. For the common situation of two ranges, each over a 16b TCP port number, a maximum of 61 TCAM entries may be required in total.
To clarify further the relationship between code length=W for one dimension, vs code length=2d(W−1)+1 for multiple dimensions, the application of external coding must be examined in more detail. At first glance, a contradiction appears by applying the 2d(W−1)+1 equation to the case of one dimension (d=1), yielding 2W−1. This is significantly worse than the equation code length=W for one dimension. The difference is that for one dimension, external coding is used, interspersing permit and deny locations in the TCAM to more efficiently implement the logic of the single range. Without external coding, code length would be 2W−2. For the multi-dimension case, external coding is used to get the complementary range of each dimension resulting in a result of deny. External coding is already used for that purpose; it cannot be reused within each dimension to reduce the code length for expressing the complementary range of that dimension. The result for each complementary range TCAM entry is already deny; reversing this (by attempting to re-use external coding) would result in an accept (or inhibiting deny). The accept would override all other TCAM entries, such as the deny results of other dimensions. In order to function correctly, the accept would have to be local in scope to the current dimension's complementary range, but TCAM functionality can't limit the scope of an output.