OpenFlow packet processing centers around flow tables containing flow entries, each having ternary values for a selected set of packet header fields. For each packet, flow entries are searched in order, with the first matching entry returned. OpenFlow defines a set of recognized packet header fields including the commonly used Medium Access Control (MAC) source and destination addresses, ether type, Internet Protocol (IP) source and destination addresses, IP protocol, Transmission Control Protocol (TCP) port numbers, Virtual Local Area Network (VLAN) and Multiprotocol Label Switching (MPLS) tags, etc., in addition to user defined extensible fields, and a metadata field to hold non-packet information. The input port Identification (ID) is also provided as a match input.
Associated with each flow entry is a set of actions to be executed upon a match. The defined actions include setting values into any of the packet header's recognized fields, pushing and popping VLAN and MPLS tags, performing Provider Backbone Bridge (PBB) encapsulations and decapsulations, as well as miscellaneous operations such as Time to Live (TTL) manipulations. Actions can also include assigning the packet to an output port and queue, sending the packet to the controller, or dropping it. With OpenFlow still a new and evolving standard, it is anticipated that implementors will create user defined extensions for any required capabilities not yet in the standard, such as other encapsulation types (Generic Routing Encapsulation (GRE), Network Virtualization using Generic Routing Encapsulation (NVGRE), Virtual Extensible Local Area Network (VXLAN) etc.).
Openflow 1.0 defined a single flow table. Later versions allow multiple tables, numbered and processed sequentially, with actions taken as a result of any stage modifying the packet before it is sent to the next stage. A different action option allows selected modifications to be postponed until after all match stages are executed. A flow entry match also specifies the address of the next table to be executed as a forward-only branch.
Openflow groups provide for implementation of capabilities including multicasting and Equal Cost Multipath (ECMP). An OpenFlow group is a type of action, defined as a collection of buckets, where each bucket contains actions of the types defined above, in addition to optionally recursively containing other groups. OpenFlow ALL groups implement multicast by executing all buckets, each on a different copy of the packet. OpenFlow SELECT groups execute one randomly selected bucket, implementing ECMP, equal cost multipath, and with optional weights attached to each bucket, unequal cost multipath (uECMP). The random selection is typically done by hashing on a selected set of packet headers, so that different flows are routed to different buckets, but all packets from the same flow receive identical treatment. Fast failover groups execute the first bucket associated with a live output port, allowing quick reaction to link failures. OpenFlow indirect groups contain a single bucket, and are useful simply as a layer of indirection.
OpenFlow defines an implementation of meters, which are used to measure data flow rates. Meters are a type of action executable on a flow table match. A meter includes a number of bands, typically two or three, each of which has a defined maximum data rate and optional burst size. Using a leaky bucket analogy, a meter band is a bucket filled by the packet data rate and drained at a constant allowed data rate. Overflow occurs if the integration of data rate exceeding quota is larger than the burst size. Overflowing one band triggers activity into the next band which presumably allows a higher data rate. Meter bands are often informally named with colors, such as green, yellow and red for a three color meter. Openflow provides for remarking the packet Differentiated Services Code Point (DSCP) field as a result of overflowing the base band. This information might be used later to direct the packet to a different queue where it may be more subject to delay or dropping in case of congestion.
OpenFlow defines statistics collecting counters, mostly packet and byte counters, for flow tables, flow table entries, groups and group buckets, meters and meter bands, input/output ports and queues. While most of them are optional, the statistics information they provide are useful to implementers.
As will be explained later, memory requirements for flow tables, action entries and statistics counters contribute a great deal to cost considerations for a large portion of a switch chip.
OpenFlow switches communicate with a network controller through a set of messages defined by the standard. Messages are provided for initial configuration, and for set up, modification, or deletion of flow table, group and meter entries. Statistics information can be requested by the controller and communicated back by the switch. Flow entries can as an action direct a packet to be sent to the controller, and the controller can send packets back to the switch for OpenFlow processing. A common mode of operation is that if a packet is unexpectedly unmatched in a flow table, the packet is sent to the controller, which responds by installing flows into one or more switches. This implements Software Defined Networking (SDN) canonical separation of data plane and control plane processing; switch functionality is confined to matching flows and taking the indicated actions; any unrecognized pattern is sent up to the controller which shoulders the responsibility for all high level decisions.
A description will be provided of the high level design of a match stage, a unit which can be cascaded to implement the core of OpenFlow functionality: providing the flow tables, matching of packet fields against flow table entries, taking the actions indicated by the match, and collecting statistics. U.S. patent application Ser. No. 14/072,989 “An Openflow Match and Action Pipeline” herein incorporated by reference, provides additional background material in this area by describing additional aspects of a match stage implementation. The implementation to be described targets a 64 port by 10 Gbit/s switch, which produces a maximum packet rate of 960M packets/s. If a match stage pipeline is run at 1 GHz or slightly less, each packet has a single clock cycle to flow through the pipe.
The parser accepts the incoming packet data and produces a 4 k bit packet header vector as its output, with each defined header field in a fixed, though configurable, position. This 4 k bit vector provides the input data to the match pipeline of match units. The 4 k bit vector is composed of a number of 8, 16, and 32 bit fields, each of which has an associated valid bit.
OpenFlow defines all tables with ternary matching capability; that is, each table bit can have the ternary values of 0, 1 or don't-care. Wildcarding bits allow single table entries to match a wide variety of packets. At the performance levels targeted, 1 packet per clock cycle, ternary match tables are implemented with TCAM (ternary content addressable memory) modules. Another useful type of table is an exact match table, where no wildcarding is allowed, and packets must exactly match table entries. These can be implemented as hash tables in SRAM, with the advantage that an SRAM is significantly less area, than a TCAM table of equivalent bit count.
Exact match tables are implemented by using SRAMs as hash tables. Generally a hash table takes some or all of the input bits of a search word, and generates a pseudorandom, but predictable, number from those bits. One method of hashing generates an N bit address from an M bit input, where for each of the N hash output bits, a separate M bit mask is and'ed with the input data, and then the parity of the result is taken. The input bit mask for each hash output bit is different, and there are methods known in the art to select masks with desirable properties. This method is equivalent to the mathematical operation of a Galois Field multiplication. There are multiple methods of generating hash addresses known in the art, but all of them attempt to generate an address, where for all data inputs, the addresses end up uniformly distributed across the N bit address space, so that hash table entries are evenly spread out over all words of the SRAMs used for hash table data storage.
Hash tables operate by accessing an array of data at that hashed location and checking to determine whether the accessed data is the desired data. This check is performed by doing a comparison between the desired data and accessed data to determine their equality. Hash tables also have to contend with the possibility of address collisions, where multiple distinct inputs hash to the same address. There are many techniques known in the art for accomplishing this. Multiway hashing addresses this by making K hash addresses instead of one and looking up the data in those K separate locations in K individual arrays. When an entry is to be added, these multi-way hash tables provide several possible locations, all equally good, increasing the probability that one of the locations will be empty.
A further refinement is to implement exact match tables using Cuckoo hash tables, multi-way hash tables distinguished by a fill algorithm providing high hash table occupancy. When adding an entry, if all possible locations for that entry are full, since all current occupants also have other choices for their locations, one of them can be evicted to an alternative location, possibly resulting in a chain of evictions and continuing until an entry is placed in an empty location. Cuckoo hash tables routinely achieve high efficiencies, for example, above 95% occupancy for 4-way hash tables. Reads are deterministically performed in one cycle, with all ways accessed in parallel. While all of this is known art, the essential element is that to implement OpenFlow exact match tables, multi-way hash tables are used where a number (preferably at least 4) separate SRAM modules each compute individual hash keys and determine whether the search data exists at the computed hash location. As a result, a number of SRAM modules are used.
Tables can be made deeper by incorporating additional memory modules, with each memory either adding another way to the hash table or incrementing the number of items contained in an individual way. If multiple items are contained in an individual way, an address lookup yields multiple entries, any of which when compared may turn out to be the desired data. Alternatively, this can be viewed as a multi-bank hash table where some of the banks use the same hashed address, so the number of hash keys is less than the number of banks.
Match data input to tables may vary significantly in width, from single fields to hundreds of bits. For example, a max match width may be 640 bits. Narrower widths can be accommodated by breaking the 640 b match memory into units, for example 8 units of 80 bits each. Then these 8 units may be combined to make an 8× deeper table (with 8×the number of entries) or may instead create 8 separate tables. Memory units may be combined in groups, for example 2 units wide for 160 bits, etc. If 4 ways are required in the exact match table of a match stage, this results in an array of 8×4 memory units, each of which can match 80 bit wide data. The 80 bits of match per unit memory is an approximation which will be explained in more detail further below. Each memory is 1000 words deep in this example.
Ternary matches using TCAM are also configured to match wide or narrow entries, with a 640 b wide memory split into 8 80 bit units, which like the SRAM exact match memory may be used to create narrow deeper memories or separate memories or combined in groups as desired. Given the larger area of TCAM, less of it is typically provided than exact match memory, for example, ½ or ¼. The TCAM could also be divided into a different number of units, for example into 16 40 bit units.
When either a ternary or exact match is found, it provides several pointers which together contain the required information to perform the desired actions. These include an instruction memory address, an action memory address and size, and a next table address. Actions are performed by modifying fields of the 4000 bit packet header vector. There are 64, 96 and 64 words of 8, 16 and 32 b respectively in the packet header vector, with an associated valid bit for each. Note that the number of words of each size described above is illustrative and could easily be changed to larger or smaller numbers in a specific design. As will be described later in more detail, the action engine inventively uses a Very Long Instruction Word (VLIW) architecture, where each of these words has its own functional unit to compute updated values. The units of smaller words can be combined to execute a larger field instruction, for example, 2 8 bit units can merge to operate on their data as a single 16 bit field. There is a VLIW instruction memory with individual instruction fields for each of these words.
OpenFlow specifies simple actions such as setting a field to a value as well as complex operations, such as PBB encapsulate or inner-to-outer or outer-to-inner TTL copies where the outer and inner fields may be one of a number of choices. Complex operations can be easily decomposed into multiple actions on separate fields but complex modifications to each individual field become more difficult as the data line rate increases. These complex modifications can be subroutines at low speeds but must be flattened into single-cycle operations at the packet per clock cycle rate of this device. It is important to provide action capabilities powerful enough to handle the expected types of operations. A general set of conditionalized arithmetic, logical, multiplexing, and bit field manipulation capabilities is provided. Since the chip area of the action engine is dominated by selecting source operands (action data and packet header words) rather than by computation, flexible action capabilities come at relatively low cost.
Action operations may get their sources from packet header fields, or from an action memory. An action indicated by a match may be simple and require only a small amount of data from action memory or complex and require a large amount of data. Action memory is 640 bits wide, and may be output in units of 1, ½, ¼, ⅛, or 1/16 of that full width. Action memory is accessed by providing a size and an aligned pointer. Action memory is separate from instruction memory in the same way that instructions and data are separate entities in a processor. For example, a common action of an IP router is to decrement the IP TTL field, set the MAC source and destination addresses, and set the switch output port and queue. These individual modifications to the various affected fields are all grouped together into a single VLIW instruction where the various needed constants, like subroutine arguments, are in specific places in the delivered action word. While each match entry may require an individual action word for the data constants, they may all reference the same VLIW instruction word. The number of required instruction words is considerably less than the number of required action words.
In addition to the action size and address and the instruction address, a next table address is provided as a result of a match.
The 4 bit action size (to specify from 1× to 1/16× size) and 13 to 17 bit action address (to allow a range of from 8K 640 bit entries to 128 k 40 bit entries) can be combined into a single 18 bit entry as follows:
TABLE 1Action memory address and size codingwwww00000 ;1x13 bit addresswwwwf1000 ;1/2x14 bit addresswwwwff100 ;1/4x15 bit addresswwwwfff10 ;1/8x16 bit addresswwwwffff1 ;1/16x17 bit addresswhere w specifies a bit of the word (640 bits) address and f specifies an address of a fractional portion of that 640 bits. As will be seen later, with a maximum of 16 individual tables in a stage and 32 stages, 9 bits are required for a next table address. With an instruction memory of 32 VLIW words per stage, 5 bits are required for instruction address. If all of these fields are contained in extra bits of the match memory, this equals 32 bits of overhead. There are 4 valid bits which together provide both valid and rule version information. There are also 4 field-valid bits. The match data is composed of 8, 16, and 32 bit words from the packet header vector. The 4 field-valid bits allows the rule to check field presence or absence of each individual field. 8 Error Correction Code (ECC) bits are also provided. This totals 48 bits of overhead including ECC. With a 112 b memory, 64 bits are provided for match data in a single unit memory, with this detailed explanation showing that less than the 80 bits described above as a general explanation are actually available for matching. When two memories are combined for a wider word, the overhead bits are paid once over a larger width, yielding 168 bits of match data width (with 11 field valid bits). In all cases of appending two or more units, the match width is greater than N×80 bits for N units.