Adaptive devices capable of learning input-output relationships have, for the most part, been restricted to the case of supervised training in which, for each input, the desired output is known. For many situations where adaptive learning of an input-output relationship is required, the desired output is not known for each individual input. However, it is often possible to monitor available information in the operating environment and from this information derive a score or grade that measures performance of an adaptive device over multiple sets of inputs. The adaptive device can then use this grade as the basis for improving its performance over a sequence of trial performances. In the past, such adaptive devices capable of reinforcement training have been restricted to either the learning of relatively simple classical conditioning relationships or to the adaptive development of lookup table input-output relationships.
It is an objective of the present invention to overcome these limitations and provide graded learning method and deice that can learn an arbitrary input-output relationship using an arbitrary grade.