The present invention relates reasoning, prediction method, and a system having a reasoning means built therein which uses data involved in a natural phenomenon, such a social phenomenon as vital statistics, such an economic phenomenon as stock price fluctuation, or such chemical and physical phenomenon as industrial plants.
As a technique for effectively utilizing accumulated data, there has been actively studied an analysis technique oriented to a massive amount of data called data mining. The form of data to be extracted varies from application purpose to application purpose, but one of general purposes is reasoning and prediction of new data.
Memory-based reasoning (MBR) is known as a technique for calculating a reasoning result on the basis of similarities from a large amount of accumulated data. The MBR is summarized in a paper entitled xe2x80x9cTOWARD MEMORY-BASED REASONING, Communications of the ACMxe2x80x9d, Craig Stanfill, David Waltz, December 1986, Vol. 29, Number 12, pp. 1213-1228 (which will be referred to as the prior art 1, hereinafter).
In the prior art 1, case data given in the form of record is regarded as vectors, and a similarity between cases is judged on the basis of the magnitude of a distance between the vectors. Cases having large similarities are extracted as similar cases, and output field values of the similar cases are weighed by the distances between the vectors and averaged to calculate a reasoning result.
Another technique associated with the MBR is described, e.g., in Japanese Patent No. 2,632,117. The patent will be called the prior art 2, hereinafter. In the prior art 2 is arranged so that an input space is divided so that an output error is smaller than a threshold value and constructed so that a space defined by input variables is meshed. A case database for use in reasoning is created by embedding (quantizing) cases into the meshed space. A reasoning result of the reasoning error is obtained with use of distances in the meshed space.
The prior art 1 fails to provide means for determining the number of similar cases employed for a new case, and the determination of the number of similar cases is done by a user. In general, when viewed from a vector space, a distribution of cases is not always uniform. Accordingly, it is not always that, only by merely specifying the number of employed cases, suitable similar cases taking the inter-vector distances into consideration can be collected.
The prior art 2 is directed to a technique for use in inferring typical cases obtained by decimating cases. However, when viewed from the entire cases before decimated, the inter-vector distance possessed by similar cases for a new case is determined by a mesh size regardless of a case distribution. Since the mesh size is uniform, however, it is not always that suitable similar cases can be collected for each mesh when the case distribution is not uniform. Further, in such a case, it becomes difficult to suitably calculate a reasoning result of the reasoning error.
It is therefore an object of the present invention to provide a means for enabling suitable reasoning even when a case distribution is not uniform, which tends to decrease a reasoning accuracy in the prior art, and also to provide a means for calculating a confidence degree with which the validity of a reasoning result can be judged.
In accordance with the present invention, similar cases are determined by executing a step of calculating a similarity of a case and thereafter utilizing a distribution of predictor field values of cases having high similarities. That is, since the similar cases are determined by taking the peripheral distribution of the new case into consideration, suitable similar cases can be selected for reasoning even when the case distribution is not uniform.
In accordance with the present invention, further, a confidence degree is determined by utilizing a distribution of predicator field values of cases having high similarities. Thus even when the case distribution is not uniform, a suitable confidence degree can be calculated.