Many of today's software packages that generate graphics, such as Xfig and Powerpoint, share a common method for creating rectilinear patterns. Starting with a white rectangular canvas, the user repeatedly applies a “rectangle tool” with which one sweeps out a rectangular area and colors the interior with a specified color, overwriting any previous contents of that area.
For example, the final pattern of FIG. 1 is created by applying a sequence of three rectangle operations 101, 102, 103 to an underlying canvas that is a 4×4 grid. The sequence depicted in the figure is not particularly efficient since two operations would have sufficed.
Given the manual work involved in producing such figures, it is natural to want to determine efficient plans of action in advance. Let a pair (R, c), where R is a rectangle and c is a color, be called a rectangle rule, and a sequence of such rules (R1, c1), (R2, c2), . . . , (Rn; cn) be called a rectangle rule list or RRL. For an RRL , let P denote that pattern it produces. Given a target pattern P, the goal is to find a minimum-length RRL  such that P=P. Not only can finding such an RRL save labor, it also provides a scheme for picture compression that is potentially more effective than the classic scheme in which a black rectilinear figure is represented by a collection of black rectangles that covers it. Moreover, a restricted version of the problem has an important application to Internet management, and specifically to access control lists.
Access control lists (ACLs) are used in network router line cards to determine which arriving packets should be forwarded to their destination and which should be dropped. For instance, an Internet Service Provider (ISP) might want its access routers only to forward packets that came from or are destined to customers who have officially been assigned to that router. ACLs can also be used to implement firewalls and to provide a range of levels of quality of service.
An ACL is made up of a sequence of rules. In an extended ACL, a rule can be viewed as having five components:
Source RangeProtocolDestination RangePort(s)ActionThe source and destination ranges are specified by binary strings s of length w or less (where w is the length of an IP address, currently 32 but expanding to 64 in IPV6). The string s matches all IP addresses that have it as a prefix; an empty string matches everything. Possible protocols include IP, TCP, UDP, and ICMP, with IP matching all protocols and the others matching only packets labeled by that particular protocol. Ports can either be an individual port number, a range of such numbers, or “any,” which matches everything. The action can either be permit, which allows the packet through, or deny, which causes the packet to be dropped. A basic ACL omits the protocol and port fields.
An ACL operates as follows: When a packet arrives, the router determines the first rule in the list that it matches and performs the action specified by that rule. If there is no match, the packet is dropped. In high-speed routers the classification is performed by special hardware called “ternary CAMs” (TCAMs) that evaluate all the rules in parallel and output the lowest indexed match. TCAMs are expensive and impose limits on the size of ACLs, as do the overall memory constraints in a line card. Thus, a natural optimization criterion for ACLs is to minimize their length while preserving the results of their actions. Note that this will also reduce the maximum delay in routers that evaluate rules sequentially.
Other optimization criteria have also been studied. Data structures and algorithms have been studied for quickly determining which rule in an ACL is the first match or, when information about the data traffic is known in advance, trying to minimize the average time to find the applicable rule, either simply by reordering the ACL or by devising sophisticated decision tree classifiers. None of those approaches, however, have yet been implemented in real-world routers. Although such approaches can be of value when using software simulation to study router behavior, for now ACL minimization remains the most direct way to improve the access control performance of current real-world routers.
In modeling the ACL minimization problem, matters can be simplified by ignoring the protocol and port restrictions. Those fields are not present in the simpler basic ACLs that are still sometimes used, and even within extended ACLs, most rules (approximately 85% or more) are basic in that the entries for protocol and port match everything. The more restrictive rules, even when present, are likely to have priority over the basic ones or be independent of them, so that optimizing over the basic rules would at least be a component of an effective heuristic for the general problem. Thus in what follows it is assumed that the ACLs are restricted to basic rules.
The problem of ACL minimization can usefully be modeled in geometric terms. Consider a 2w×2w grid with a cell for each combination of a source and destination IP address (columns and rows indexed from 0 to 2w−1). The action of an ACL can be viewed as coloring the cells of that grid, with the cell colored white if packets with that combination are denied, and black if they are permitted. Note that each rule applies to a rectangle in our grid whose x-coordinates include the IP addresses in the source range for the rule and whose y-coordinates include those in the destination range. Thus each rule in an ACL is once again a pair (R, c), where R is a rectangle and c is a color, just as in the RRL application.
There are two distinctions between ACLs and RRLs, one minor and one major. The minor distinction is that the rules in an ACL are in the reverse order from those in an RRL, with the final color of a grid cell depending on the first rectangle in the ACL that contains it, rather than the last, as was the case with RRLs. The pattern corresponding to an ACL is thus determined by reversing the order of its rules, and then treating that reversed list as an RRL.
The second and more meaningful distinction is that the set of possible rectangles is highly constrained in ACLs since their x- and y-coordinates are determined by IP address prefixes. The widths and heights of rectangles must all be of the form 2k for 0≦k≦w, and if a rectangle's width (height) is 2k, then it must start in a column (row) whose coordinate is congruent to 0 mod 2k. Note that this means that the projections of the rectangles on the axes are laminar in that for any two rectangles the projections are either disjoint or one is contained in the other. As will be seen, that property can make an algorithmic difference.
Several special cases of ACL minimization have previously been studied. One such case is the one dimensional case where only the destination (or source) is restricted. This corresponds to the problem of minimizing routing tables, where under most routing protocols, the link out which a packet is sent depends only on its destination address. The rules in that case correspond simply to intervals, rather than rectangles. It is not difficult to see that the one-dimensional ACL problem can be solved in polynomial time by dynamic programming. That holds even when multiple colors are allowed, as might be the case, for instance, when one wants to specify different levels of service quality. For a K-colored pattern on a 2w×2w grid specified by an ACL of length n, one known algorithm produces an optimal equivalent ACL in time O(Knw).
The case of one-dimensional RRL minimization has been solved in polynomial time by dynamic programming. For the basic two-color case it is trivial—simply using a separate interval for each maximal black region is optimal. With more than two colors the RRL problem appears to be more difficult than the ACL problem. The best running time bound known to have been achieved is O(Kn3), with the dynamic program exploiting the observation that even though arbitrary interval rules are allowed, it may be assumed without loss of generality that the intervals used in any specific RRL are themselves laminar.
Another restriction on rules that has been studied is the case where only black rectangle rules are allowed. The problem of finding the minimal set of such rules needed to create a given pattern has been called rectilinear picture compression and has long been known to be NP-hard.
The above hardness and approximation results refer to the RRL problem. For the ACL problem, it has been shown that the optimal all-black ACL for an arbitrary pattern (holes allowed) can be found in polynomial time. A key lemma in that case is that, when the projections of the allowed rectangles on both axes are laminar, there is a unique non-redundant cover of the black cells of the pattern by maximal black rectangles.
Unfortunately, finding the optimal all-black RRL/ACL doesn't necessarily help much to solve the present problem. Examples such as the grid 200 of FIG. 2, showing a pattern with a linear factor performance penalty for restricting to “all-black” rules, suffice to show that this restriction can cause the best ACL or RRL to increase by a factor of Θ(OPT(P)). In the general (2n+1)×(2n+1) version of that example the cells with both coordinates odd are black and the rest of the pattern is white. The optimal RRL/ACL includes one black rule covering the whole pattern, overwritten by n white column rules and n white row rules, for a total of 2n+1 rules, whereas the best all-black-rule list contains (n+1)2 rules. Moreover, there doesn't seem to be any analog of the key lemma above to handle the case of both black and white rules. The inventors have proven that RRL minimization is NP-hard and have conjectured that the same is true for ACL minimization.
That motivates the search for more realistic special cases to consider, and that approach is taken in what follows.
There therefore remains a need for a technique for optimizing a set of rules for creating a rectilinear pattern in a grid, and specifically for minimizing the number of rules in an RRL or ACL.