Multi-table pipelining has emerged as the foundation of the next generation SDN datapath models, such as recent versions of OpenFlow, RMT (P. Bosshart, G. Gibb, H. S. Kim, G. Varghese, N. McKeown, M. Izzard, F. Mujica, and M. Horowitz. Forwarding metamorphosis: Fast programmable match-action processing in hardware for sdn. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM 2013, pages 99-110, New York, N.Y., USA, 2013. ACM.), and FlexPipe (R. Ozdag. Intel Ethernet Switch FM6000 Series—Software ware Defined Networking (www.Intel.com/content/dam/www/public/us/en/documents/white-papers/ethernetswitch-fm6000-sdn-paper.pdf). By avoiding key problems such as unnecessary combinatorial explosions, multi-table pipelining can substantially reduce datapath table sizes, and is therefore essential for making SDN practical. At the same time, the introduction of multi-tables also adds additional SDN programming tasks including designing effective layout of pipelines, populating the content of multiple tables, and updating multiple tables consistently when there are changes. These tasks add substantial burdens for SDN programmers, leading to lower programming productivity. Automating these tasks can substantially simplify SDN programming.
Although there is previous work on how to use multi-table datapath (e.g., P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vandat, G. Varghese, and D. Walker. P4: Programming protocol-independent packet processors. SIGCOMM Comput. Commun. Rev., 44(3):87-95, July 2014 and C. Schlesinger, M. Greenberg, and D. Walker. Concurrent Netcore: From Policies to Pipelines. In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming, ICFP 2014, pages 11-24, New York, N.Y., USA, 2014. ACM.), this work still requires the programmer to specify detailed forwarding pipelines, including, for each flow table to be used in the program, the fields which it can match, the form of the matching, whether to use priorities or not, and a graph describing dependencies between the processing order that tables must occur in.
On the other hand, the algorithmic policy (AP) programming model provides a dramatically simplified network programming abstraction. In particular, an algorithmic policy consists of an ordinary algorithm, expressed in a conventional, Turing-complete, computational programming language that describes the functional input-output behavior of a network function without referencing implementation details related to tables, matches, actions, and other low-level constructs that are introduced in mapping such programs into network processing computing devices, such as various Application-Specific Integrated Circuits (ASICs) or Field-Programmable Gate Arrays (FPGAs). An algorithmic policy only specifies the functional input-output behavior defined as a function that determines, for each input packet how the program will change its internal state and what packets should be emitted as a result. Using the algorithmic policies programming model, the user defines a function that is repeatedly executed on packets, taking packets in and producing some number of modified packets out. This may be represented by the block-diagram depicted in FIG. 1.
In order to execute on packets at high-throughput, packet forwarding systems include a datapath component, a dedicated computational element which implements a highly-specialized computational model which executes simple packet processing steps at a high rate, but lacks full generality (i.e. it is not Turing-complete). Since such systems must still execute more complex algorithms, such as shortest path computations used in various protocol implementations, which cannot be executed on the specialized datapath component, most packet processing systems comprise (at least) two high-level components: one is the aforementioned datapath and the other in the control element, which typically executes on a general-purpose CPU connected to the datapath element via a communication network (i.e. processor interconnect). In particular, since the AP programming model is Turing-complete, individual APs may include complex computations which cannot be executed solely on a datapath component. Therefore, in general, APs must be compiled into such a two-component system. FIG. 2 depicts such a system with a block diagram.
Other high-level programming abstractions for network programming have been proposed, however all of these severely restrict expressiveness, so that the programmer cannot write most programs of interest in the language. For example, the NetCore language (C. Schlesinger, M. Greenberg, and D. Walker. Concurrent Netcore: From Policies to Pipelines. In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming, ICFP 2014, pages 11-24, New York, N.Y., USA, 2014. ACM) only allows a subet of time-invariant forwarding behavior to be expressed, while FlowLog (T. Nelson, A. D. Ferguson, M. J. G. Scheer, and S. Krishnamurthi. Tierless programming and reasoning for software-defined networks. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI 2014, pages 519-531, Berkeley, Calif., USA, 2014. USENIX Association.) requires the computation of output ports for a given packet to be expressed in a form of Datalog that is not Turing-complete. Furthermore, to date, all systems implementing these restricted programming abstractions only use a single flow table, severely limiting their scalability and performance.
Previous work on implementing general algorithmic policies uses the method of Trace Trees (A. Voellmy, J. Wang, Y. R. Yang, B. Ford, and P. Hudak. Maple: Simplifying SDN Programming Using Algorithmic Policies. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM 2013, pages 87-98. ACM, 2013.). This method has several disadvantages. First, the compilation uses only a single flow table. Second, the approach relies on a so-called reactive flow table population method wherein the switch rule table is treated as a cache and new rules are only inserted into the cache when an arriving packet is not matched in the cache and an authoritative controller, which implements the trace tree system, is consulted. This delay induced in diverting packets to consult a controller severely affects system performance.
What is needed are methods to automatically derive, populate, and update effective multi-table pipelines from datapath-oblivious algorithmic policies (AP) (A. Voellmy, J. Wang, Y. R. Yang, B. Ford, and P. Hudak. Maple: Simplifying SDN Programming Using Algorithmic Policies. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM 2013, pages 87-98. ACM, 2013), where datapath-oblivious means that the programming language does not expose constructs regarding datapath-specific details such as flow tables, matches, actions, registers, and so on. The present disclosure focuses on the algorithmic policies model because it is highly general and flexible; hence it poses minimal constraints on SDN programming. On the other hand, effectively utilizing multi-table pipelines from algorithmic policies can be extremely challenging, because APs are expressed in a general-purpose programming language with arbitrary complex control structures (e.g., conditional statements, loops), and the control structures of APs can be completely oblivious to the existence of multi-tables. Hence, it is not clear at all whether one can effectively program multi-table pipelines from such APs. We refer to this as the oblivious multi-table programming challenge.
To illustrate the challenge of programming packet processing devices, we consider a simple, but representative example AP called L2-Route to illustrate the basic challenges and ideas. The AP performs routing using layer 2 addresses:
// Program: L2-Route 1. Map macTable(key: macAddress, value: sw) 2.onPacket(p): 3.s = p.macSrc 4.srcSw = macTable[s] 5.d = p.macDst 6.dstSw = macTable[d] 7.if (srcSw != null && dstSw != null) : 8. egress = myRouteAlg(srcSw, dstSw) 9. else10.  egress = drop
In this example and throughout this document, we use the following AP abstraction: each packet p, upon entering the network at an ingress point, will be delivered to a user-defined callback function named onPacket, also referred to as the function ƒ. This function sets the egress variable to be the path that the packet should take across the network. We refer to this style of returning the whole path as the global policy. A variation on this programming model is to define a local, per-switch onPacket function. The results will be similar.
Although L2-Route looks simple, it includes key components of a useful algorithmic policy: maintaining a system state variable, and processing each packet according to its attributes and the current state. Specifically, line 1 of L2-Route declares its state variable macTable: a key-value map data structure that associates each known L2 endpoint to its attachment switch. Given a fixed packet, L2-Route performs a lookup, using the macTable state variable, of the source and destination switches for the packet, and then computes a route between the two switches through the network.
Result of Current Tool:
The only current work that handles general algorithmic policies is Maple ((A. Voellmy, J. Wang, Y. R. Yang, B. Ford, and P. Hudak. Maple: Simplifying SDN Programming Using Algorithmic Policies. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM 2013, pages 87-98. ACM, 2013), which uses a trace tree approach: a policy is repeatedly invoked within a tracing runtime system that records the sequence of packet attributes read by each invocation, and the recorded execution traces form a trace tree; a trace tree can be compiled to a single flow table, where each leaf of the tree corresponds to a rule in the flow table. FIG. 3 shows the resulting trace tree and the flow table required for L2-Route to support n hosts with MAC addresses 1 . . . n communicating with each other. For example, the bottom left result pi is the execution trace of a packet with macSrc 1 and macDst 1.
Despite its simplicity, this example illustrates well the issues of the trace tree approach. First, assume the program sees packets between each pair of endhosts stored in macTable. Then the trace tree has n2 leaves, generating a flow table with n2 rules. This, however, as we show below, is much larger than necessary. Second, even worse, assume a setting where packets with source or destination MAC not stored in macTable can appear (e.g., due to attacks). Then, the trace tree approach will still generate flow table rules for such packets. In a worst case where a large number of such packets appear, the trace tree approach may generate well above n2 rules-in the limit, the trace tree can generate 296 rules, resulting in a not resilient system.
Suboptimal Manual Table Design by Experts:
Since there are no existing tools to automatically generate multi-tables, we asked several experienced network professionals with significant SDN knowledge to design tables for L2-Route. We allowed experts to take advantage of datapath registers (aka metadata fields) that can be used to store state across tables, and which are available in several dataplane models, including OpenFlow and P4. We use notation regx to denote a register holding values for program variable x. For the present discussion, we assume that data values can be written and read from registers by suitably encoding them into bit array representations.
We found that most experts chose a two-table design, as shown in FIG. 4, reasoning that the program performs two classifications, one on macSrc and the other on macDst and hence two table lookups suffice. The first table matches on macSrc to write an appropriate srcSw value into regsrcSw. The second table also matches on the outcome of the first table (held in regsrcSw) since this attribute also affects the desired outcome. If n is the number of hosts in the network and k the number of switches output by the macTable mapping, then the two-table design requires n+kn rules. Hence, this design successfully avoids the n2 cross product problem, since the number of regsrcSw values is typically much lower than the number of host interfaces in the network.
While this design improves over the single table design generated by trace trees, it is suboptimal for most networks that have many more hosts than switches. In particular, the three table design shown in FIG. 5, which has a final table that matches on combinations of switches, typically requires far fewer rules. The three table design requires 2n+k2 rules, which compares favorably to the previously described two-table design. For a network with 4,000 hosts and 100 switches, the two-table design requires 404K rules while the three-table design requires 18K rules, a 22× difference.
The preceding discussion demonstrates that selecting good pipeline designs requires considering details such as the flow of data values through the given program (in order to determine the sizes of tables), which are difficult and tedious for humans to consider and easily overlooked.
Burden of Populating Tables with Rules:
In addition to designing a pipeline, a human expert is required to define how tables are populated with rules at runtime, which can be a complex task. Consider for example, how to generate new rules for the two-table design when a single, new entry (a′, s′) is inserted into macTable. If a′ is a new key and s′ is a value not previously occurring in the table, then Table 1 requires a new entry macSrc: a′→regsrcSw: s′ and Table 2 requires new entries of the form regsrcSw: s′, macDst: a→output: oa,s′ for every key a of macTable. This illustrates that a single change to a high-level state may require changes in multiple flow tables.
Moreover, if L2-Route is modified in a minor way, the situation becomes more challenging:
2.onPacket(p) :3. s = p.macSrc4. srcSw = macTable[s]4a. if srcSw member [1,2,3,4] :4h.  egress = drop; return5. d = p.macDst6. dstSw = macTable[d]
In this version of L2-Route, the program drops packets from switches 1 through 4. In this case, it is unnecessary to continue processing packets from switches 1-4. In the two-table design, Table 2 need not match on values 1-4 for the regsrcSw field, which could lead to substantial saving of space when the number of hosts is large. Taking advantage of this in populating entries for Table 2 therefore requires reasoning about the flow of possible values to Table 2, which is a burden for programmers.
Burden of Target-Specific Programming:
In addition to the conceptual design of the forwarding pipeline and the runtime processes to populate the pipelines' rules, a programmer is faced with the substantial burden of encoding these designs into target-specific forwarding models. For example, when targeting Open vSwitch (openvswitch.org), a programmer may use the Nicira-extension registers to implement the datapath registers and populate entries using an OpenFlow protocol. On the other hand, when implementing the design with P4, the programmer would need to declare metadata structures and fields, and would need to use a target-forwarding-element-specific runtime protocol to populate rules in the P4 forwarding element. Since there is no existing portability layer that spans various OpenFlow and P4 switches, the high-level design and runtime algorithms will need to be coded multiple times for each supported target, leading to duplicated effort and increased likelihood of bugs. We use the term “southbound protocol” to refer to any protocol by a controller or other software to interact with a forwarding element to control or observe the forwarding element's behavior.