1. Field of the Invention
The present invention relates to an analysis method and an analyzer apparatus of manufacture data obtained from manufacture processes.
2. Description of the Related Art
Physical and statistical analysis of what factors in a manufacture process have influence on product quality is necessary for manufacturing products through a plurality of processes, in order to achieve required quality and to enhance the production efficiency. In a case where production involves complicated production processes, there are a number of factors which influence product quality, and therefore, it requires an immense amount of time and effort to physically analyze all factors and to extract highly influencing factors. Thus, in general, a physical analysis is performed after collecting manufacture data in each process, extracting the factors by data analysis and narrowing down the factors. In so doing, an automatic data analysis for efficiently performing an analysis, and data mining for searching for correlations and patterns hidden in a large volume of data are performed.
However, some values may be missing from actual manufacture data, and for that reason, it is difficult to analyze the manufacture data without any change.
As shown in FIG. 1A, in analyzing the causal correlation between an item A and an item Y, for example, Lot 01 where data of both items A and Y are present is not a data analysis target; however, Lot 02 and 03 are analysis targets for either one of the data is missing.
In analyzing a causal correlation between independent variable (explanatory variable) items A, B and C and an dependent variable (objective variable) item Y, as shown in FIG. 1B, in a sample Lot 04, the values of the items A, B and C and the dependent variable item Y are obtained so that the correlation between the items can be examined. However, the correlation between the items A, B and C and the item Y is not clear in samples Lot 01, Lot 02, and Lot 03 because data of the independent variable item C is missing from the sample Lot 01, data of the independent variable item B and the dependent variable item Y are missing from the sample Lot 02, and data of the items A and C are missing from the sample Lot 03.
FIG. 1C shows a case when there is a missing value in the sample as in FIG. 1B, a character string “unknown” (a string representing a missing value) is used for substitution of the missing value in a sample in which the item C of the apparatus name is missing, and the value “3” which is an average value of the item B of samples is used for substitution of the missing value in a sample in which the numeral item B is missing.
FIG. 1D shows the values of each item and analysis results when the item C having many missing values is excluded from the sample in FIG. 1C. The sample Lot 02 is not to be analyzed for its dependent variable item Y is missing.
FIG. 2 is a comparison result of the degree of the influence on the dependent variable (item Y) among the independent variables (items A, B and C) using the sample data of FIG. 1D.
The difference between values of item Y with the value of the item A is “1”, the difference between values of item Y with the value of the item B is “3.5”, and the difference in Y with the value of the item C is unknown.
In the past, when analyzing a plurality of manufacture data, when there is a missing value, to handle the missing value in the same way as the normal value, data analysis required to insert a substitution value, to exclude a sample with many missing values, or to exclude an item with many missing values.
Patent Document 1 describes that when a characteristic value of sample data is missing, a Manhattan distance between the missing data and the normal data is obtained, and the normal data of the minimum Manhattan distance is complemented as the substitution of the missing data.
Patent Document 2 describes that when extracting correlated partial condition data by combining a plurality of feature amount and events, the presence/absence of the input event in a selected area is determined, and when the input event has a defect value, the complemented value of the defect value based on the events in the selected area corresponding to the feature amount other than the defect value is calculated.
As described above, in the past, when there is a missing value in analysis target data, the substitution value of the missing value is obtained by any method, and the data is analyzed by using the substitution value, or the data is analyzed excluding sample data with a number of missing values.
However, the analysis results vary depending on the value to be used as a substitution value instead of the missing value, and the percentage set as the criterion to exclude the sample containing higher percentage of missing values. For this reason, there is a problem that variation occurs in the analysis accuracy. In addition, when using samples without a missing value for each item, the number of samples would be different from sample to sample, causing variations in the analysis accuracy between items.
For example, a regression tree analysis requires to separate the data into two sets, which are a set with the values of the dependent variable item is large and a set with the values of the dependent variable item is small, determined by a value of an independent variable item, in order to determine the intensity of the influence of the independent variable on the dependent variable. When the sample has a missing value, the analysis cannot be performed, and therefore, a substitution value is used for the analysis as stated above. At that time, the dependent variable value would be in either the set with large values or the set with small values depending on the substitution value, causing an analysis error.    [Patent Document 1] Japanese Patent No. 3654193    [Patent Document 2] Japanese Published Patent Application No. 2001-184329