A known problem in computerized evaluation of data for determining whether the data conforms to predefined rules relates to the need to avoid redundant rules in the rules set. This applies particularly when an item to be checked can legitimately conform to more than one rule in the rules set, since this requires that the item be checked against each rule in the rules set. Thus, redundancies in the rules set are wasteful of computer resources in that more memory is required to store the rules set and more processing time is needed for determining with which rules the item conforms. It is therefore desirable to remove redundancies from the rules set prior to checking.
One particular area where rule checking is required, is the field of network classification tables. Suitable background is provided in U.S. Pat. No. 5,956,721 (Douceur et al.) issued Sep. 21, 1999 and entitled “Method and computer program product for classifying network communication packets processed in a network stack”. Data packets (referred to simply as “packets”) sent through a communication network are classified according to message type, for example. The message type is transmitted with the packet in a header thereof, which may contain other classification data. On receipt, packets are passed up the message stack, each element of which may remove a portion of the header information and make processing decisions based on the information in the packet or any header information that has not been previously removed by lower level drivers.
A packet is classified for certain processing in a given driver based on information about the packet that is contained in the headers or elsewhere in the body of the message itself. Usually a single best classification must be returned and this requires that rules be implemented to cater for overlapping conditions. The rules are stored in a database and U.S. Pat. No. 5,956,721 relates to a method for classifying packets for processing by multiple drivers in a network stack by multiple drivers.
Classification of data packets in network communication is also described in WO 99/27684, which describes a method for classifying traffic according to a definable set of classification attributes, which may be hierarchical and define a policy or rule of assignment for flow of data traffic through the network. According to one embodiment, the classification process checks at each level if the flow being classified matches the attributes of a given class. If it does, then processing continues down to the links in the classification hierarchy. If it does not, then the class at the level that does match determines the policy for the flow being classified.
It is not uncommon for a packet to conform to multiple rules, which may even contradict each other. This problem is resolved by partial ordering whereby relative priorities are assigned between contradictory rules. It may also occur that the set of rules contains redundant rules which are never executed. This may happen when for each packet satisfying such a rule there is higher priority rule which the packet also satisfies. As a result the rule set is larger than necessary, thus increasing the time required to search for matching rules. Given that the rule sets are large and complex, it is not feasible to detect redundant rules manually.
U.S. Pat. No. 5,943,667 (Aggarwal et al.) issued Aug. 24, 1999 and entitled “Eliminating redundancy in generation of association rules for on-line mining” discloses a computer method of removing simple and strict redundant association rules generated from large collections of data. A compact set of rules is presented to an end user being devoid of many redundancies in the discovery of data patterns. The method is directed primarily to on-line applications such as the Internet and Intranet. Given a number of large item sets as input, simple redundancies are removed by generating all maximal ancestors, the frontier set, for each large item set. The set of maximal ancestors share a hierarchical relationship with the large item set from which they were derived and further satisfy an inequality whereby, the ratio of respective support values is less than the reciprocal of some user defined confidence value. The resulting compact rule set is displayed to an end user at some specified level of support and confidence. The method is also able to generate the full set of rules from the compact set.