This patent relates to the field of information systems and data mining, and more particularly to a method for aggregating data by considering both the input and output properties of the data.
There has been an explosive growth in the amount of available data in the last decade. The fast growth pace has far outstripped the growth of experts who are able to analyze this data. Hence, there is a growing demand for automated tools for data analysis. One way of analyzing data is to cluster the data. Clustering consolidates information in the data for abstraction, compactness, removal of redundant information, etc. While there are hundreds of approaches to clustering available in textbooks and commercial solutions, most methods are only concerned with homogenous data types (variables). A few methods that can cluster heterogeneous data types produce clusters with heterogeneous variables in the same cluster. Some data processing applications, such as dimensionality reduction, are designed to work with data clusters with homogeneous data.
It would be desirable to have a method of clustering heterogeneous data types in order to produce clusters such that within each cluster the data types are homogeneous.