This invention relates to the analysis of data sets characterized by attribute data and, more specifically, the identification of possible sources of variation in a response variable defined on the data set.
There are a wide variety of situations where a data set (or a collection of data) may need to be analyzed to locate sources of variation in a response variable associated with the data. For example, such a data set can be based on a manufacturing process and would be derived from attributes such as specific machines, operators, components etc. Each product produced by such a manufacturing process will have a plurality of attribute derived data associated with it. Considering, e.g., the production of computer components, where each component is produced in a process where many machines are controlled by different operators, such data may define the "history" of each component, including a complete list of the data associated with its production, such as the operator(s) and machine(s) used to produce it.
In such a manufacturing process there will be some components produced that are defective. The process will therefore have some incidence of defects or fraction defective (which is defined as the number of defective products divided by the total number of products) associated with it. The fraction defective would be a response variable since it is a variable that is responsive to, or is dependent on, the attributes (i.e., the particular machines, operators etc.).
By analyzing such a data set, it may be possible to identify sources of variation in the response variable. For example, if a particular machine is malfunctioning, then products produced on that machine will tend to have a higher fraction defective than products produced on other machines. The malfunctioning machine will therefore cause a variation in the response variable (i.e. fraction defective). If this variation can be isolated, the malfunctioning machine will be identified, thereby providing an opportunity to improve the manufacturing process and reduce the overall number of defective products produced.
In addition to individual attributes such as an individual machine or operator causing defects, attribute interactions expressed by interactions of attributes may also be the source of a variation in the response variable. For example, if a particular operator is unskilled in the use of one machine then those products that have been produced by the interaction of that machine and operator may have a high fraction defective. The individual attributes, i.e. the operator himself or the machine itself, will not have as high an incidence of defects as the interaction of the attributes. It should be apparent that in a complicated manufacturing process where a large number of machines, operators, components, etc. are involved, there are an enormous number of different combinatoric possibilities that could account for defective products. Identifying the individual attributes or attribute interactions that best maximize the significance of variations in the response variable is essential if the quality of the products is to be improved.
Traditionally, quality control engineers isolate sources of variation by analyzing data derived from the particular manufacturing process. Bar graphs (e.g. Paretos) are typically constructed plotting the absolute number of defects or the fraction defective (typically normalized) associated with many different attributes. By evaluating the various fraction defectives for attributes, hypotheses are drawn as to the likely causes of defects(i.e., either an individual attribute or some interaction of attributes). The hypotheses may then be tested by conducting controlled experiments.