1. Field of the Invention
The present invention relates to a method and a device for detecting clusters amongst data, in particular amongst data which denote complex products or services arranged into orders. Such data are characterized by the fact that they can be arranged in orders which contain some identical data. For example, identical components or parts of services may occur on a plurality of occasions if, in a business, the same product is sold on a plurality of occasions or identical components appear in different products, or if, in orders that are complicated to organize in the medical field, individual procedures, such as patient-related investigations or treatments and likewise care procedures in a hospital are included on a plurality of occasions. To achieve efficient organization, it then becomes necessary to detect correlated data amongst the data arranged in orders.
2. Description of the Related Art
The above problem occurs in particular in the computer-aided control of the running of a hospital, if in the field of hospital and patient management such services have to be organized, planned and controlled, such as the aforementioned investigations, general care procedures and also special rehabilitation procedures. Comparable problems also generally occur in the field of computer-aided production and order-processing in businesses.
To find such correlated groups within data, a method is described on pages 1-71 in Independent Component Analysis by Aapo Hyvärinen, Juha Karhunen and Erkki Oja, published by Wiley-Interscience in 2001, in which a search is made for independent components of the observed data. These components can include inter alia observed data such as economic indicators. Since the method was originally derived from signal theory, it cannot simply be applied as it is to binary sequences of data. Since in the present case, however, the data searched for are in fact correlated data in which certain components or services appear, or fail to appear, and it is these very data that are represented in binary form, the Independent Component Analysis method is not particularly useful in the present case.
Furthermore, the phenomenon of hierarchical clustering is known, which produces disjunctive clusters that do not have any components in common. In such a case, it is necessary to select a hierarchy level in which the clusters are defined.
The disadvantage of this is that the very clusters for which a search is made are those that contain some identical data and the above method does not therefore detect the desired clusters and the hierarchy level that is required is not easy to determine.
Another known method is called “Frequent Item Sets”, which creates problems insofar as it generates too many highly correlated groups especially when less frequently occurring groups are also considered.
In “Probabilistic Latent Semantic Analysis” a search is made for statistically independent source data which create distributions across individual components. From these distributions a group can be defined via a threshold, the threshold not being easy to determine and it being necessary to establish a fixed number of source data.