1. Technical Field
The present invention relates generally to computer processing and more specifically, to a system and method for discovery and selection of an optimal probability model.
2. Description of the Related Art
Probability models with discrete random variables are often used for probabilistic inference and decision support. A fundamental issue lies in the choice and the validity of the probability model.
In statistics, model selection based on information-theoretic criteria can be dated back to early 1970's when the Akaike Information Criterion (AIC) was introduced (Akaike, H., (1973), “Information Theory and an Extension of the Maximum Likelihood Principle,” in Proceedings of the 2nd International Symposium of Information Theory, eds. B. N. Petrov and E. Csaki, Budapest: Akademiai Kiado, pp. 267-281). Since then, various information criteria have been introduced for statistical analysis. For example, Schwarz information criterion (SIC) (Schwarz, C., (1978), “Estimating the Dimension of a Model,” The Annals of Statistics, 6, pp. 461-464) was introduced to take into account the maximum likelihood estimate of the model, the number of free parameters in the model, and the sample size. SIC has been further studied by Chen and Gupta (Chen, J. and Gupta, A. K., “Testing and Locating Variance Change Points with Application to Stock Prices,” Journal of the American Statistical Association, V.92 (438), Americal Statistical Association, June 1997, pp. 739-747; Gupta, A. K. and Chen, J. (1996), “Detecting Changes of Mean in Multidimensional Normal Sequences with APplications to Literature and Geology,”: Computational Statistics, 11:211-221, 1996, Physica-Verlag, Heidelberg) for testing and locating change points in mean and variance of multivariate statistical models with independent random variables. Chen further elaborated SIC to change point problem for regular models. Potential applications on using information criterion for model selection to fields such as environmental statistics and financial statistics are also discussed elsewhere.
To date, studies in information criteria for model selection have focused on statistical models with continuous random variables, and in many cases, with the assumption of iid (independent and identically distributed).
In decision science, the utility of a decision support model may be evaluated based on the amount of biased information. Let's assume we have a set of simple financial decision models. Each model manifests an oversimplified relationship among strategy, risk, and return as three interrelated discrete binary-valued random variables. The purpose of these models is to assist an investor in choosing the type of an investment portfolio based on an individual's investment objective; e.g., a decision could be whether one should construct a portfolio in which resource allocation is diversified. Let's assume one's investment objective is to have a moderate return with relatively low risk. Suppose if a model returns an equal preference on strategies to, or not to, diversify, it may not be too useful to assist an investor in making a decision. On the other hand, a model that is biased towards one strategy over the other may be more informative to assist one in making a decision—even the decision does not have to be the correct one. For example, a model may choose to bias towards a strategy based on probability assessment on strategy conditioned on risk and return.
In the operations research community, techniques for solving various optimization problems have been discussed extensively. Simplex and Karmarkar algorithms (Borgwardt K. H., 1987, “The Simplex Method, A Probabilistic Analysis,” Springer-Verlag, Berlin; Karmarkar, N., 1984, “A New Polynomial-time Algorithm for Linear Programming,” Combinatorica 4 (4), pp. 373-395) are two methods that are constantly being used, and are robust for solving many linear optimization problems. Wright (Wright, S., 1997, “Primal-Dual Interior Point Methods, SIAM, ISBN 0-89871-382-X) has written an excellent textbook on primal-dual formulation for the interior point method with different variants of search methods for solving non-linear optimization problems. It was discussed in Wright's book that the primal-dual interior point method is robust on searching optimal solutions for problems that satisfy KKT conditions with a second order objective function.
At first glance, it seems that existing optimization techniques can be readily applied to solve a probability model selection problem. Unfortunately, there are subtle difficulties that make probability model selection a more challenging optimization problem. First of all, each model parameter in the optimization problem is a joint probability term bounded between 0 and 1. This essentially limits the polytope of the solution space to be much smaller in comparison to a non-probability based optimization problem with identical set of non-trivial constraints (i.e., those constraints other than 1≧Pi≧0).
In addition, the choice of robust optimization methodologies is relatively limited for objective functions with a non-linear log property; e.g., an objective function based on Shannon Information Criterion (Shannon, C. E., and Weaver, W., The Mathematical Theory of Communication. University of Urbana Press, Urbana, 1972). Primal-dual interior point is one of the few promising techniques for the probability model selection problem. However, unfortunately the primal-dual interior point method requires the existence of an initial solution, and an iterative process to solve an algebraic system for estimating incrementally revised errors between a current sub-optimal solution and the estimated global optimal solution. This raises two problems. First, the primal-dual formulation requires a natural augmentation of the size of the algebraic system to be solved, even if the augmented matrix happens to be a sparse matrix. Since the polytope of the solution space is “shrunk” by the trivial constraints 1≧Pi≧0, solving the augmented algebraic system in successive iterations to estimate incremental revised errors is not always possible. Another even more fundamental problem is that the convergence of the iterations in primal-dual interior point method relies on the KKT conditions. Such conditions may not even exist in many practical model selection problems.
Accordingly, an efficient and accurate technique for selecting an optimal probability model, while avoiding the limitations and problems of existing optimization technologies, is highly desirable.