The standard techniques currently being used to analyze large datasets are Cluster Analysis techniques and Self-Organizing Maps. These techniques, however, have many disadvantages. They do not allow for the fingerprinting and visualization of an entire dataset, and missing values are not easily accommodated. The computational requirements are high for these techniques, and the mapping time increases exponentially with the size of the dataset. The current data needs to be reanalyzed when new datasets are added to the analysis, and vastly different results can occur for each new dataset or group of datasets added. Analyzing large numbers of massive datasets is difficult.
To take optimum advantage of the information in multiple, large sets of data, we need new, innovative tools. There is a need for methods that enable easy identification and visualization of potentially significant similarities and differences between multiple large datasets in their entirety. There is also a need for methods to intelligently store and model large datasets.