1. Field of the Invention
The present invention relates to a method for detecting a selectable number of groups of objects having at least one selectable characteristic from a population of objects specifiable by a plurality of attributes, each of said object groups having a quality which, by means of a selectable function, results-from the number of objects of the object group and from an unusualness of the distribution of the characteristic within the object group and which is detected by a relation between the distribution of the characteristic of the respective object group and the distribution of the characteristic in a reference population.
2. Description of the Relevant Art
Data Mining, or Knowledge Discovery in Databases (KDD), as it is referred to in the research world, has recently been gaining widespread attention. In one popular definition, KDD is seen as the "automatic extraction of novel, useful, and valid knowledge from large sets of data" [FPSS96]. As this definition indicates, KDD offers a general body of techniques that are capable of finding different kinds of "knowledge" in different kinds of data. A data mining task can only be defined precisely when it is exactly specified what kind of knowledge is to be found in which form, and in which way the data for analysis are available in a storage file or database system.
The searching in databases for data patterns having selectable characteristics, i.e. the detecting of objects having a selectable characteristic among a population of objects is no longer a trivial task, particularly if the population comprises a very large number of objects. In case the of relatively small amounts of data and resp. a relatively small population, the task of detecting specific objects will be accomplished by examining each object of the population separately. In the case of relatively large amounts of data, such an approach is not economical anymore. A reasonable approach in this regard resides in that the objects of the population, which can be described by means of attributes, are divided--according to corresponding predetermined attributes--into objects groups whose objects have the corresponding attribute. In this manner, the population is hierarchically subdivided into object groups which in turn will be further subdivided into object groups.
It is a frequently posed task to perform a search in a database to detect object groups whose objects have a predefined characteristic which occurs unusually often within the object group. This "unusual" statistic characteristic is to be seen in relation to the statistic occurrence and resp. frequency of this characteristic in a reference population. Such a reference population can be either the (total) population of the objects or a subset of the objects of the total population, particularly an object group. Normally, however, apart from the "unusualness" of an object group to be detected, also the size of the object group is significant. Particularly, in a large number of applications, it is desirable to detect object groups of the largest possible size and the highest possible unusualness. Therefor, the unusualness is linked to the number of objects of an object group through a functional relationship so as to define the "quality" of a group. Thus, it is desired to examine large quantities of data for groups of data having specific minimum qualities. A case in point would be the situation wherein a company, intending to introduce a new product, plans a mail advertising campaign and, for reducing the effort involved, seeks to address only such groups of persons which are conceivable as potential buyers of the product. If, for instance, an opinion survey is to be conducted, only such persons should be surveyed who correspond to the "average" of the population and resp. the average part of the population relevant for the evaluation of the survey.