In the field of automatic data classification, the problems to be solved consist, starting from a database containing significant quantities of data, of the order of a few hundred thousand or a few million, relating to n objects or individuals Oi, in sorting these data very rapidly. These data generally take the form of arrays (or matrices) of two types: rectangular matrices (objects×attributes measured on these objects) or else square matrices (objects×objects), representing relations between the objects. The aim of the clustering consists in constructing, on the basis of these matrices, groups of coherent objects having strong descriptive (description of the individuals) and/or behavioral similarities.
When the data take the form of square matrices, they represent, more often than not, measurements of resemblances or proximities between the objects of the database. The aim thereafter is to seek to discover an optimal and automatic decomposition of the population into a much more restricted number of classes (or clusters) of objects (similar or having the same behaviors) and thereafter to define strategies of actions, depending on the concerned field. For example, one of the possible actions is to discover faults making it possible to predict other faults in a computer network. Another example is to discover a set of customers of a bank to whom it is possible to propose certain products and who have a high probability of responding positively. Another action is to discover niches of clients of an insurance company for whom it is possible to create specific insurance policies which were not, a priori, evident to define, etc. One of the main difficulties in discovering these classes results from the fact that, despite the progress achieved in the computational power of processors and the storage capacity of current computers, the stored data being ever more voluminous, or occupying ever more memory room, it is very difficult to cluster the objects of a database with reasonable processing times. This is still more true when the available data take the form of square matrices representing relations between objects.
Various automatic classification procedures are known from the prior art. Thus, it is possible to cite the procedure of the k-means, hierarchical clustering or else relational analysis.
Patent application EP 1960916, filed by the Applicant, describes a clustering method where the initial data take the form of tables whose rows are the individuals to be clustered and whose columns are variables measured on these individuals.
Despite the good results offered by these prior art procedures, they exhibit notably the following weaknesses:    1) a problem with fixing the number of classes and referents (centers) to be used to initialize the partition to be found. Indeed, the procedures of k-means type, for example, need to fix in an arbitrary manner and, a priori, the number of classes to be found in the data as well as a few initial individuals considered as the centers of the initial classes;    2) a problem with fixing, arbitrarily and a priori, the cutoff level of the dendrogram for the hierarchical clustering procedures,    3) an impossibility of processing significant volumes of data in a linear manner in reasonable times when the data take the form of relational data.
Thus, the usual clustering procedures do not make it possible, on the one hand, to process data of graph or relational type in a linear manner and, on the other hand, they depend heavily on the fixing of parameters such as the number of classes to be found or the centers (objects chosen from among the population according to random mathematical draws or in an arbitrary manner).