1. Field of the Invention
The present invention relates to an information processing apparatus, a control method therefor, and a computer-readable storage medium, and in particular to an inference technique for learning data whose attribute is known and inferring data whose attribute is unknown.
2. Description of the Related Art
As one data processing technique using a computer, an inference technique is known in which an unknown event is inferred based on the knowledge extracted from known events. Many inference apparatuses for inferring unknown events acquire knowledge used for inferring through supervised learning. Supervised learning refers to a method of learning a correspondence relationship (knowledge) between an attribute and a characteristic value; the characteristic value representing the characteristics of a target data set, and data sets having the same attribute as that of the target data (that is, data whose attribute is known (known data)). Note that the characteristic value may be referred to as an “observation value”, and the attribute of data may be referred to as a “class” or “label”. An inference apparatus infers, with respect to data whose attribute is not known (unknown data), the attribute thereof based on a characteristic value of the unknown data with the use of knowledge acquired through supervised learning. Accordingly, the quality of knowledge acquired through supervised learning has a large influence on the inference accuracy of the inference apparatus using that knowledge.
With conventional inference apparatuses, supervised learning is performed on the assumption that known data and unknown data have the same distribution. Therefore, it has been considered that if the accurate distributions of known data are learned by using a sufficient number of known data sets, the attribute of unknown data can be inferred accurately.
Also, Japanese Patent Laid-Open No. 7-281898 discloses a technique in which by weighting and integrating inference results obtained by a plurality of inference apparatuses that employ mutually different inference methods, it is possible to obtain more accurate inference results than using a single inference apparatus.
Recently, there has been an attempt to perform diagnosis support using an inference apparatus in the medical field. For example, a technique is under examination in which by inputting a characteristic value of a lesion site, the attribute thereof (diagnosis, etc.) is inferred.
However, when causing the inference apparatus for inferring an attribute of a lesion site to perform learning, there are cases in which a sufficient number of known data sets cannot be obtained. Furthermore, a method for acquiring characteristic values of lesion sites may change due to improvement in medical equipment, or characteristics or occurrence probabilities of a lesion may change along with time or environmental changes. Because of such reasons, the results of learning of the inference apparatus are not always satisfactory, and also there is no guarantee that the initial inference accuracy is maintained at a certain level. Therefore, the user (doctor) does not know to what extent he/she can rely on the inference results by the inference apparatus, and there are even doctors who think that the inference results by the inference apparatus are unreliable. Therefore, inference apparatuses have not been effectively used.
The above-described issues cannot be solved simply by improving the inference accuracy by increasing the number of learning data sets or using a plurality of inference apparatuses. In order to solve the above issues, it is necessary for an inference apparatus to present to users information for determining the reliability of the inference result.