Machine learning is a form of artificial intelligence that is employed to allow computers to evolve behaviors based on empirical data. Machine learning may take advantage of training examples to capture characteristics of interest of their unknown underlying probability distribution. Training data may be seen as examples that illustrate relations between observed variables. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data.
One example of machine learning is supervised learning (SL). The goal of SL is to learn an accurate mapping function g: X→Y from a set of labeled training instances T={(x1, y1), (x2, y2), . . . , (xn, yn)}; where xiεX are samples from an input space X and yiεY are labels from an output space Y (iε{1, 2, . . . , n}). The mapping function g is an element of possible mapping functions in the hypothesis space G. In conventional SL, all training instances are treated as equally relevant based on the assumption that all training instances should have the same impact on the mapping function g.
However, in real-world applications, not all training instances have the same relevance, and there can be variations in the relevance of both input xi and label yi in a training instance (xi, yi). For example, when using SL on weather forecasting, training data may consist of historical samples of weather data such as measurements on temperature, wind, humidity, etc. However, such measurements may have variations including variations according to time of day, location, equipment employed, etc. For example, if training data is collected from different sources, the training instance from one source (e.g., a source with superior measurement methods, superior equipment, etc.) may have a higher relevance than training instances from another source (e.g., a source with inferior measurement methods, inferior equipment, etc.). In this example, conventional SL will consider training instances from different sources as equally relevant. As a result, higher-relevance training instances and lower-relevance training instances will have the same impact during the SL and thus the SL may not be able to generate an accurate mapping function g from the training data.
In another example, a training set may contain some training instances that have unknown input values. If a training instance has a large number of unknown input values, it may be less reliable (for example, it may have a higher likelihood of being mislabeled) and thus have a lower relevance than a training instance with known input values. If a training set contains a significant number of training instances with unknown input values, a conventional SL algorithm may not be able to learn an accurate mapping function g because of potential negative effects of low-relevance instances.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one example technology area where some embodiments described herein may be practiced.