1. Technical Field
This application generally relates to machine learning techniques, and more particularly to techniques for producing classification rules.
2. Description of Related Art
Techniques may be used to perform classification of data such as, for example, related to objects and the occurrence of events, in an automated manner. Data to be classified may be represented as a set of data items. In one representation, each data item includes one or more attribute values for a number of attributes. One classification technique uses a set of rules to classify the data items in accordance with attribute values placing each data item into a class. For example, a program may be executed in a computer system which applies a set of rules to unclassified input data. The program may produce as output a classification of each of the data items included in the input data.
Different techniques may be used in connection with producing a set of rules. The rules may be manually produced. However, manual techniques may become too expensive in terms of time, for example, as the complexity of the input data set and the associated classification increases. Additionally, the manual rule production requires a user to have knowledge about the data items and the classifications.
An alternative class of techniques automate the production of the set of rules. For example, a rule generation program may be executed in a computer system to automate rule production. It may be desirable to have the automated technique be efficient in terms of computer resources. If the rule generation is performed interactively, it may be particularly desirable to utilize a technique that seeks to minimize execution time.
Unclassified input data may include categorical and numeric, or non-categorical, data. “Categorical data” may be characterized as data that cannot naturally be ordered by a metric such as, for example, names of automobile producers, products offered by one or more manufacturers, and the like. It may be desirable to have an efficient automated technique for rule generation that may be used with categorical and non-categorical features. It may also be desirable that the rule generation technique produce rules that properly classify the given input data, and more generally, any input data set to a particular degree of correctness. In other words, it may be desirable that the generated rules are not overly specific to any particular input set, but rather achieve a high uniform degree of correct classification in accordance with all possible input data sets.