The present invention relates to optimization problems, and more specifically, to techniques for minimum regret learning in online convex optimization.
In real-time systems, the costs and other conditions are always changing. The operator of the system has to make decisions continually and the utility from making a decision depends not only on the decision itself but also on the conditions of the system or the environment. For example, the operator's task may be to track a “moving target” in the sense that the target may jump from one point to another and the operator has to aim without knowing exactly where the target is, but only where it previously was. This happens, for example, in inventory systems, where there is an optimal level of inventory in hindsight, but the decision about the inventory level has to be made before the actual demand for the item is known. The “regret” of the operator is the difference between the cost that is incurred as a result of his decision and the optimal cost that could have been incurred using another decision if the conditions had been known. In the prior art, methods have been known which minimize the total regret so that it is proportional to the square root of the total amount of time.