The sets of measuring values can concern for example quantities of image data which are recorded for example with a camera and respectively characterise an image. The method can then serve for identifying similar images, for example for automatic finding of images of a camera shot from a large quantity of shots or for finding separation points between successive shots.
It is known to process sets of measuring values in that the measuring values are assigned respectively to one class from a finite number of classes, occasionally also termed bins, which can be denoted with indices so that these measuring values are assigned to the classes and consequently a frequency distribution is defined, which indicates for each class a frequency of the measuring values of the respective set assigned to this class. The classes can thereby correspond typically to sub-intervals of an interval of the measuring values of assumable values. A known representation of such frequency distributions is formed by histograms. The technical object of comparing two or more sets of measuring values with each other, for example in order to identify events which can be compared in an automated and rapid manner during an evaluation of a large number of measuring value sets, can then be reformulated to the object of comparing two histograms with each other and of measuring a similarity between these histograms.
Various methods are known for comparing histograms or distributions which can be displayed by histograms. In the case of the simplest of these methods, the frequencies, assigned to the individual classes, of the frequency distributions to be compared are compared with each other class for class—or bin for bin—for example by measuring an overlap of the histograms corresponding to the frequency distributions. These methods can in fact be implemented with very low computational complexity but entail the great disadvantage that similarities between adjacent classes are not taken into account. Measuring values, which are in fact situated closely together but were assigned, because of the choice of classes, randomly to two different, e.g. adjacent, classes, are then treated as completely different and their proximity remains unconsidered for evaluation of the similarity of the corresponding histograms. These simple methods lead therefore in many cases to unsatisfactory and fairly uninformative results which allow in particular no reliable statement about the similarity of sets of measuring values or corresponding events. In particular, the results obtained with such methods are disadvantageously dependent upon the size of the bins or classes and upon the precise position of the more or less arbitrarily chosen boundaries between adjacent bins.
Other methods for comparing histograms take into account not only exact correspondences or overlaps but also the similarity of classes or bins which are adjacent or situated close together. An example of this is offered by the distance measure for histograms published by Rubner et al. in the International Journal of Computer Vision, 40(2), pp. 99-121 and termed Earth Mover's Distance. These and comparable methods are in fact suitable for a substantially more meaningful assessment of the similarity of frequency distributions or histograms but entail the disadvantage of exceptionally high computational complexity. Thus the computational complexity with the mentioned method for calculating the so-called Earth Mover's distance increases with an increasing number of measuring value classes by between O(N3) and exp(N) if N is the number of classes.
In particular for applications in which a large number of measuring value classes is used and are to be compared, such a high computational complexity even when comparing one pair of measuring value sets can be unacceptable.