Traditional prediction systems rely on explicitly stated rules in an attempt to indirectly explain or describe the behavior of data and make useful predictions or other decisions. The explicit rules are applied to input data to generate output data, i.e., a prediction, a class assignment or another decision. However, the input data may have subtle and/or unknown relationships that are not recognized by a person/algorithm generating the rules, or that cannot be described by explicit rules. Furthermore, because input data are often noisy, distorted or incomplete, explicit rules may fail to operate correctly, even on patterns broadly similar to data sets from which the explicit rules were constructed. Additionally, some complex problems are non-linear, so that their solutions cannot be easily recognized by humans in the absence of machine intelligence.
Typical neural networks and many other machine learning (“ML”) networks do not rely on explicitly stated rules, but construct their own rules by processing input data to generate accurate outputs (i.e., within a predefined error bound). Thus, these networks are often capable of finding unknown relationships between input data and (predicted) outcomes, even when these relationships are highly complex and/or non-linear. Critically, use of ML algorithms requires only the existence of a sufficient and relevant set of prior experiential data (i.e., accurate training exemplars that include examples of input and associated output), and does not require the user to have any knowledge of the rules that govern the system's behavior. Thus, the nature of ML algorithms is such that they learn to identify unknown relationships, which in turn allows networks that utilize ML algorithms (i.e., ML networks) to generalize to broad patterns with incomplete or noisy input data and to handle complex non-linear problems. However, prior to use, the networks must be trained with known input and outcome data to provide predictions with an acceptable level of accuracy. Training ensures that neural and other ML networks are sufficiently accurate so that output data (e.g., predictions, classification decisions or other kinds of decisions) generated for input data with unknown outputs are relatively reliable.
Training the network by supervised learning thus involves sequentially generating outcome data from a known set of input data (where inputs and outputs are correctly matched). These generated outcome data are compared to known sets of outcomes that correspond to known input data. That is, it is expected that the network will generate and thereby predict the known outcomes when receiving the known set of input data. When the known outcomes are not returned, the network may be manipulated (usually automatically) so that further outcome data returned by the network are within the predefined error bound of known outcome data. The network thereby learns to generate known output data (or an approximate equivalent thereof) from known input data, and thereafter may be used for generating outputs from input data without known outputs. Thus, the networks are adaptive since they are reconfigured during training and during actual use to learn new rules or to find new patterns in new data. However, the training typically requires hundreds or thousands of iterations when the network is constructed and may require subsequent re-training during use to maintain the accuracy and reliability of the generated output data.
The power and accuracy of any ML algorithm prior to the invention described has thus been inherently closely tied to the strength of available training sets (i.e., exemplars, a series of inputs and known outcomes used to initially train the ML algorithm), and the closeness of the relationship between the training sets and the situations to be predicted. Furthermore, when training sets are entirely absent, very limited (e.g., small numbers of exemplars, sparsely populated, etc.), of poor quality (e.g., biased, imprecise, skewed, etc.), and/or of limited relevance to the eventual set to be analyzed in the future (e.g., one or more basic situations related to target outcomes have changed), network performance may approximate random behavior and accuracy of network predictions can be very poor.
Thus, the power of all existing approaches to use ML algorithms for classification and prediction has been, prior to the current invention, primarily limited by the strength of the training set. In the absence of accurate training exemplars, or in the presence of sparse or otherwise weak training data, ML algorithms have extremely limited, or no, utility.