The present invention relates generally to the field of artificial intelligence and, more specifically, to a method of predicting future outcomes that successively approach an optimum through the use of machine learning.
The field of machine learning is generally considered to be a broad subfield of artificial intelligence. Machine learning deals generally with the development of algorithms, methods, and techniques that allow computers to “learn” methods. Research in the area of machine learning research is generally focused on automating statistical methods for efficient use by software agents. Machine learning has a wide spectrum of applications including natural language processing, pattern recognition, search engines, medical diagnosis, bioinformatics, detecting credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition, object recognition in computer vision, robotic locomotion, optimization, and game playing, to name a few.
More particularly, in the areas of optimization and game playing, an interesting set of problems relates to the so-called “experts problem”, which may be expressed in the following manner. A decision maker relies upon the advice of various “experts” in attempting to predict an event, such as whether or not it will rain the next day. The decision maker has access to the experts, each of which provides a prediction based on previous observations. The decision maker uses an algorithm that combines the advice of these experts. Such an algorithm employed by the decision maker is called an experts algorithm. Known experts algorithms usually guarantee that, after many iterations, the number of mistakes that the algorithm makes will be approximately at least as good as that of the best expert, in retrospect.
An even more general framework, which encompasses the experts problem, is referred to as the online convex optimization (OCO) problem. However, the OCO problems that are described in the prior art do not utilize “state information”. In the example given above, suppose that the decision maker also has access to various measurements, such as, for example, temperature and cloud location. Intuitively, this information could potentially improve the performance of the decision maker, but it is not clear a priori how to model prior information in the online learning framework.
One approach is to attempt to learn the correlation, if any, between the given information and the observable data and to use such correlation to predict future behavior. However, the information, e.g., temperature, may or may not be correlated with the actual observations, e.g., whether or not it later rains. Even more so, it is conceivable that the state information may be strongly correlated with the observations, but this correlation is very hard to extract. For example, the prior knowledge could be encoded as a solution to a computationally hard problem or even an uncomputable one. Such approaches are not generally thought to be robust or practical.
Another approach is to associate different experts with different decision states and then use standard expert algorithms to predict future behavior. The problem with such an approach is that the number of states grows exponentially with the dimension of the state space. Therefore, this approach quickly becomes infeasible even for a modest amount of prior information. Other difficulties with this approach arise when the domain of the attributes is infinite.
Sill another approach is that used in the model for portfolio management with side information introduced by T. M. Cover and E. Ordentlich, in their book entitled “Universal Portfolios with Side Information” 42:348-363, 1996. Their approach handles discrete side information, and amounts to handling different side information values as separate problem instances. The measure of performance in their model is the standard “regret” metric. The concept of regret comes from the field of Decision Theory. The regret of the decision maker after T steps is defined as the difference between the total cost that the decision maker has actually incurred and the minimum cost that the decision maker could have incurred by choosing a certain point repeatedly.
It is apparent that a new approach to the solution to online convex optimization problems is needed, which does not assume anything about the distribution of the data, prior information, or correlation between the two. The measure of performance should be comparative, i.e., based on an extension of the concept of regret. This approach should take into account the geometric structure of the available information space.