The amount of data currently available overwhelms our capacity to perform analysis. Thus, good tools to sort through data and determine which variables are relevant and what trends or patterns exist amongst those variables becomes a paramount initial step in analyzing real-world data. In defense and security applications, the consequences of missed information can be dire.
Mixture models are the generic term given to models that consist of the combination (usually a summation) of multiple, independent functions that contribute to the distribution of points within a set. For example, a mixture model might be applied to a financial market, with each model describing a certain sector of the market, or each model describing behavior of the market under certain economic conditions. The underlying mechanism that creates the overall behavior of the system is often not directly observable, but may be inferred from measurements of the data. Also, a combination of models may be used simply for convenience and mathematical simplicity, without regard to whether it accurately reflects the underlying system behavior.
General mixture models (GMM) have been successfully applied in a wide range of applications, from financial to scientific. However, exemplary embodiments include applications having militarily-relevant tasks such as tracking and prediction of military tactics. Implementation of such militarily-relevant tasks necessitates extension of standard techniques to include a wider range of basis models, a more flexible classification algorithm that enables multiple hypotheses to be pursued along with a confidence metric for the inclusion of a data point into each candidate model in the mixture (a novel contribution in this work), the application of standard techniques for pushing the GMM solver out of a local optimum in the search for a global solution, and parallel implementations of these techniques in order to reduce the computation time necessary to arrive at these solutions. Further exemplary embodiments include domain expertise metrics and techniques for dimension reduction, which are necessary processes in dealing with real world problems, such as searching for patterns in enemy tactics, such as IED emplacements and suicide attacks.
Overlapping fields such as unsupervised machine learning, pattern classification, signal processing, and data mining also offer methods, including: principal component analysis (PCA), independent component analysis (ICA), k-means clustering, and many others all of which attempt to extract models that describe the data.
PCA transforms the basis of the domain so that each (ordered) dimension accounts for as much variability as possible. Solutions often use eigenvalue or singular value decomposition. Concerns include the assumption of a linear combination, computational expense, and sensitivity to noise and outliers.
In regard to ICA limitations, ICA assumes mutual statistical independence of source signals; and ICA can not identify actual numbers of source signals; also, ICA does not work well in high dimensions. ICA separates a multivariate signal into a summation of components. It identifies these by maximizing the statistical independence, often by a measure non-fit to a Gaussian model. It typically requires a centering, whitening, and dimension reduction to decrease the complexity of the problem; the latter two are often accomplished by using PCA. ICA can not in general identify the number of source signals. ICA solutions can be non-unique in the ordering of components, and the scale (including sign) of the source signals may not be properly identified.
However, ICA identifies summed components by maximizing the statistical independence. k-means finds a pre-specified number of clusters by minimizing the sum of variances within each cluster. The solution as specified by cluster indicators has equivalence to PCA components. Further ICA concerns include having to know the number of clusters, assumptions of Gaussian clustering, and reliance on good seed points. FlexMix methods allow finite mixtures of linear regression models (Gaussian and exponential distributions) and an extendable infrastructure. To reduce the parameter space, FlexMix methods restrict some parameters from varying or restrict the variance. FlexMix methods assume the number of components is known, but allow component removal for vanishing probabilities, which reduces the problems caused by overfitting, and provides two methods for unsupervised learning of Gaussian clusters: one method is a “decorrelated k-means” algorithm that minimizes an objective function of error and decorrelation for a fixed number of clusters, and the second method is a “sum of parts” algorithm that uses expectation maximization to learn the parameters of a mixture of Gaussians and factor them.
Learning models from unstructured data have a wide range of fields of application, and thus learning models comprise a well-studied problem. The fundamental approach is to assume a situation in which an underlying mechanism, which may or may not be observable, generates data such that each observation belongs to one of some number of different sources or categories. More generally, such models may be applied indirectly to generate a model that fits, even if the underlying model is known to be different than the components used.
Gaussian mixture models are used for classifying points into clusters that enable an analyst to extract the underlying model or models which produced a set of observations. For simple situations, this technique may be used to easily separate the data into clusters that belong to a particular model. Referring to FIG. 1A, FIG. 1B and FIG. 1C, the images in FIG. 1 represent a contrived example of one such case where three models are easily identified. The image in FIG. 1B offers no clear component models and would be difficult to automatically classify. The image in FIG. 1C appears to depict separate component models, but still some outlying data points that do not fit at all and some points that could be in either of two clusters. This case in FIG. 1C is much more realistic than the other two models.
It should be noted that while a Gaussian function is a common basis function, there is nothing in the theory that prevents linear, quadratic, transcendental, or other basis functions from being considered as possible components, and indeed such models are being used in learning of general mixture models (GMM). The difficulty that arises in such cases is that the combinatorial explosion of possibilities becomes computationally problematic.
There are several methods used to estimate the mixture in GMM. The most common is expectation maximization (EM), which iteratively computes the model parameters and their weights, as well as assesses the fit of the mixture model to a plurality of data. Thus the first step at each iteration computes the “expected” classes of all data points, while the second step computes the maximum likelihood model parameters given the class member distributions of the plurality of data. The first step requires evaluation of the Gaussian or other basis function; the second is a traditional model-fitting operation. The nice thing about EM is that convergence is guaranteed, but only to a local optimum, which means that the algorithm may not find the best solution. This convergence is achieved in a linear fashion.
Utilization of the EM approach highlights another problem with the general mixture model approach, regardless of the basis functions included, is that the methods inherently must assign each data point to a particular basis model; there is no room for uncertainty associated with this assignment, though it inherently exists within the data. Also, EM is sensitive to errors in the class assignment, introducing the possibility of missing the introduction of a new model into the mixture when new data doesn't quite fit. Multiple hypotheses can not both claim to draw upon a single data point, which means that one of the hypotheses must be eliminated from consideration early in the process.
Therefore, the need exists for estimating the mixtures, which eliminates problems with the general mixture model approach of uncertainty of data point assignment associated with this assignment, though it inherently exists within the data.
Also, the need exists for reducing sensitivity to errors in the class assignment, which introduces the possibility of missing the introduction of a new model into the mixture when new data doesn't quite fit.
The need exists for a more pro-active defense posture in the assessment of threats against U.S. forces and installations in battlefields and other high-risk environments.
Furthermore, the need exists for threat assessment applications including militarily-relevant tasks such as tracking and prediction of military tactics, having a wider range of basis models, a more flexible classification algorithm that enables multiple hypotheses to be pursued along with a confidence metric for the inclusion of a data point into each candidate model in the mixture, the application of standard techniques for pushing the GMM solver out of a local optimum in the search for a global solution, and parallel implementations of these techniques in order to reduce the computation time necessary to arrive at these solutions.
Further, the need exists for applying a sampling technique such as the random sample consensus (RANSAC) method to classification procedures. By testing every data point against each proposed model to compose the mixture, a measure of the uncertainty can be obtained associated with a particular assignment, derived from the residual error associated with each model. A data point may thus be associated tentatively with any number of models until such time as the confidence in a particular component model becomes high enough to truly classify the data point into a particular pattern. In this way, a decision may be delayed until multiple hypotheses have had a chance to claim a data point.
Additionally, the need exists for embodiments which include domain expertise metrics and techniques for dimension reduction, which are necessary processes in dealing with real world problems, such as searching for patterns in enemy tactics, such as IED emplacement and suicide attacks.
Still further, the need exists for methods of extracting models that describe the data, which reduce concerns including the assumption of a linear combination, computational expense, and sensitivity to noise and outliers.
Furthermore, the need exists for methods of overcoming ICA limitations of the inability of identifying actual numbers of source signals; also, ICA does not work well in high dimensions, and typically requires a centering, whitening, and dimension reduction to decrease the complexity of the problem; the latter two are often accomplished by using PCA.
Further, the need exists for methods of identifying the number of source signals.
Additionally, the need exists for methods of estimating mixtures in GMM which do not suffer from combinatorial explosion of possibilities that are computationally problematic.
In addition, the need exists for applying a standard method simulated annealing for dealing with the problem of the possibility of getting stuck in a local minimum, which is a general problem in optimization methods. Simulated annealing probes the search space with a random jump to see if the current neighborhood of the search seems not as promising as the tested location. If the probe sees a lower cost (goodness of fit, in this case), then the jump is accepted and the search continued. While there are no guarantees, with such a large search space of possible models and parameters of those models, this is an important element in any algorithm for delivering a GMM for the observed data in a tracking or event prediction model.
Further, the need exists for managing the computational load inherent in the multiple models being evaluated, thus parallel architectures of modern graphics processing units (GPUs) will be applied to the problems at issue. Such units have proven themselves to be applicable to a wide range of repetitive computations, especially those with gridded problems or that exist (as these models will) within well-defined domains like the search spaces in the problems described above. Thus evaluation of multiple points on a model in a single step is possible, by applying a single program—multiple data parallel computation approach. This reduces the cost of each additional model that is a candidate for the mixture to a constant factor, although the model may still require a complex program for individual evaluation. The number of data points, however, becomes less of a factor in the asymptotic evaluation of the program efficiency.
Also, the need exists for risk averse measures, such as confidence metrics, both in the fit to a particular model and through the multiple-hypothesis capability by enabling a data point to be classified into multiple models in the mixture, in order to minimize risk of this research, as with any optimization method, because there is no way to guarantee 100 percent accuracy in the results.
Furthermore, the need exists for understanding the tactics used by enemy combatants, especially in the era of asymmetric warfare. The danger presented by IED emplacements and suicide attacks is extremely high. While no prediction algorithm can be expected to be 100 percent accurate in identifying dangers as they approach or excluding non-combatants from suspicion, there are patterns to this behavior. Thus detecting patterns that can be learned from existing data and applying them to situations in which the threat must be assessed is a critical problem for combat environments. Similar problems may be considered in maritime domain awareness, homeland security, and other safety and security applications.