The problem of quickly determining the similarity and dissimilarity of data objects is a widespread problem in the fields of data processing and data mining and is of relevance for a plurality of technical appliances.
Depending on the respective use case scenario, the combined processing of highly similar data objects or, alternatively, the combined processing of highly dissimilar data objects may be favorable. However, in particular for huge collections of data objects respectively comprising a plurality of ‘attribute values’ or ‘property values’ which need to be considered, approaches known in the art for determining the similarity or dissimilarity of data objects consume a considerable amount of time and processing power, as typically said approaches are based on an all-against-all comparison of data objects, whereby a plurality of property values have to be compared with each other respectively. In the realm of cloud computing, a common problem is that Virtual Machines or other program instances sharing the same set of hardware resources may only make poor usage of said resources if their requirements in terms of processing power or memory are too similar, as e.g. the consumed processing power may soon reach the resources' capacity limit while there may be plenty of unused memory. Executing an all-against-all comparison of the properties of potentially thousands of large cloud computing environments to determine similar and dissimilar Virtual Machines is, however, often not practically feasible due to the complexity and required processing time of such a comparison.