Standard techniques for understanding outliers, and thus hidden relationships, within multivariate data sets are often incomplete or misapplied. Such technology typically includes some type of regression analysis, basic neural networks and/or data mining techniques for identifying observations within a data set. Traditional data mining, for example, often produces unsatisfactory results due to spurious correlations, misleading associations, and illusory relationships. This is due to techniques that often define “patterns” upon noise in the data rather than actual systematic relationships.
In most cases, standard multivariate analysis techniques for identifying influential observations and outliers utilize some variant of multivariate regression; however, regression techniques typically do not predict outliers within groups, but rather predict a conditional mean across data sets. As a result, current methods of analysis in industries ranging from banking to homeland security often overlook systematic local relationships (i.e., relationships that apply to one or more portions of a population or subpopulation), in models that explicitly or implicitly estimate systematic global relationships (i.e., relationships that apply to the population or subpopulation as a whole).
Statements of the vexing problem of outlier identification and unique outlier cells have persisted for decades. Included among the problems are ineffective methods for identifying underlying set of true relationships.
Recent problems confronting private enterprise have heightened awareness of the risks involved in relying on current data analysis tools, particularly in the area of portfolio risk analysis. At the core of this discussion is the problem of measuring systematic global and systematic local variation. State of the art technology typically fails to capture many systematic local relationships within data sets because it applies only global measures. The existing technology falls short of observing cells (or sub-cells) based on shared characteristics derived from the underlying multivariate statistical distributions within a data set. Moreover, such technology does not provide meaningful analysis of systematic local and global variation for each data point. Consequently, state of the art technology typically overlooks potentially critical relationships at the local level that influence the relationship(s) of interest.