1. Field of the Invention
The present invention relates to a technology for processing data to extract significant factors based on relation between data.
2. Description of the Related Art
For example, in a semiconductor manufacturing, to improve yields, such a process is performed that factors in decreasing yields are found as speedily as possible based on various measurement data such as completion values at manufacturing stage and device characteristic data. It is especially important in development of new products or in review of existing manufacturing process to extract factors lowering the quality from the various measurement data, in order to improve efficiency and to obtain high reliability of the data analysis.
Conventionally a method of making a model representing variations of SPICE parameters has been applied in which a principal component analysis (multiple regression analysis) is conducted without taking circuit characteristic data into account, and a multiple regression equation is formed using thus obtained principal component. This method builds a variation model of SPICE parameters regardless of influence on circuit characteristics. Such a technology is disclosed in, for example, IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, vol. 7, no. 3, pp. 306-318, August, 1994, titled “Relating Statistical MOSFET Model Parameter Variabilities to IC Manufacturing Process Fluctuations Enabling Realistic Worst Case Design” by James A. Power, et al.; Technical Report of IEICE, November, 1996, SDM96-122, pp. 27-33, titled “Development of Worst Case MOSFET Model Parameter Determining Technique Using Multivariate Analysis” by Takeshi Yasuda, et al.; and Technical Report of IEICE, September, 1997, SDM97-128, pp. 63-70, titled “Statistical Analysis of MOSFET Sensitivity Using TCAD” by Naoyuki Shigyo, et al.
A data analysis technique enabling the extraction of factors lowering the quality from various measurement data includes data mining used in the field of financing, distribution, etc. The data mining is suitable for these fields because a large volume of data is handled in these fields. According to the statistical technique of the data mining, when a large number of explanatory variables associated with an objective variable are present, explanatory variables explaining the objective variable can be selected by finding features and tendency from a large volume of data based on relationship between the data.
Especially when objective variables are quantitative data, a regression tree analysis is used as a statistical technique of data mining. In the conventional regression tree analysis, when an explanatory variable and an objective variable change across a certain threshold, the effect of the explanatory variable can be easily found. Such a technology is disclosed in, for example, Japanese Patent Application Laid-Open Publication Nos. 2001-306999, 2002-324206, and 2003-142361.
Moreover, a method of judging a cause of abnormality of a plasma processing apparatus is publicly known that applies plasma processing to a material to be processed in a processing room. This method of the plasma processing apparatus includes an analysis-data acquiring step of acquiring analysis data including a plurality of parameters based on detection values acquired in each processing of the material to be processed by detectors arranged in the plasma processing apparatus; an abnormality judging step of judging whether the data represents an abnormality by analyzing the acquired analysis data; an effect calculating step of calculating a degree of effect on the abnormality for each parameter judged to be abnormal; and an abnormality-cause judging step of judging whether to be abnormal while removing an effect on the abnormality one after another in a descending order of the degree of effect of parameter, and of judging, when judged to be normal, the parameters from which the effect on the abnormality has been removed as parameters of abnormality cause. Such a technology is disclosed in, for example, Japanese Patent Application Laid-Open Publication No. 2004-349419.
However, in the above conventional method of making a variation model of SPICE parameters, which performs the multiple regression analysis, it is necessary to carefully examine outliers to remove from the data of each item of SPICE parameter. If the removal is not appropriately performed, appropriate results can not be obtained. Since the removal of outliers must be performed with respect to all SPICE parameters, an enormous amount of time is required.
In the conventional regression tree analysis of data mining, when explanatory variables have continuous values, relationship between an objective variable and an explanatory variable mildly changes. If this change is small as compared with other explanatory variables, explanatory variables associated with the objective variable (quantitative data) can not be appropriately obtained. Therefore, although effective for discrete values, the conventional regression tree analysis is not suitable for finding factors having a large degree of effect on continuous values such as SPICE parameters and circuit characteristic data.
For example, in a data group of 200 records, it is assumed that an objective variable Y and five explanatory variables of A, B, C, D, and E have a relationship expressed by the following multiple regression equation.Y=1×A+2×B+(−1)×C+5×D+(0.1)×D 
FIG. 1 illustrates a result of the regression tree analysis. As shown in FIG. 1, the explanatory variable D has a large effect on the objective variable, with “Para. D” appearing in each node of nodes n1 and n2, two parts to which a route node n0 is divided, nodes n3 and n4, two parts to which the node n1 is divided, and nodes n5 and n6, two parts to which the node n2 is divided. This regression tree diagram, however, does not indicate that the explanatory variable B has a second largest effect after the explanatory variable D.
The reason is as follows. In the regression tree analysis, a sum of square sums of objective variable of two subgroups is focused when a data group is divided into two subgroups, one subgroup of explanatory variable below a certain threshold and the other subgroup of explanatory variable exceeding the threshold. A sum of square sums for each of explanatory variables of A, B, C, D, and E is shown in FIGS. 2, 3, 4, 5, and 6, respectively. While the sum of square sums of the explanatory variable D fluctuates largely as shown in FIG. 5, fluctuation is small in the explanatory variable B and other variables. Therefore, fluctuations of other explanatory variables are hidden behind the fluctuation of the explanatory variable D and factors other than the explanatory variable D can not be detected.