Computerized inspection systems for the analysis of moving web materials have proven critical to modern manufacturing operations. For example, it is becoming increasingly common to deploy imaging-based inspection systems that can automatically classify the quality of a manufactured product based on digital images captured by optical inspection sensors (e.g., cameras). These inspection systems typically rely on complex technologies such as machine learning, pattern recognition, and computer vision.
Some inspection systems apply algorithms, which are often referred to as “classifiers,” that assign a rating to each captured digital image (i.e., “sample”) indicating whether the sample is acceptable or unacceptable, in the simplest case, or a more sophisticated set of labels corresponding to degrees of variation or level of quality. These types of inspection systems usually proceed in two separate phases of processing.
The first step, which is performed offline, is referred to as the “training phase.” During the training phase, a set of representative sample images are manually assigned ratings (also referred to herein as “labels”) by a set of experts. The experts may be, for example, process engineers that have significant experience in manually inspecting web products and identifying potential defects. Based on the sample images, a classification model is developed for the training data that can be used by the computerized inspection system. In this way, the training phase can be thought of as the learning part of the inspection process.
Once the model has been developed from the training data, it can be applied to new samples captured from newly manufactured product, potentially in real-time, during the “classification phase” of the processing. That is, the classification model can be used online by the computerized web inspection system to classify new sample images by assigning each of the samples a label.
The ability of the computerized inspection system to correctly rate new sample images is directly related to the quality and accuracy of the initial training data used to train the system, i.e., the sample images and their corresponding, labels assigned by the experts. For example, the samples in the training set should be representative of the entire distribution of data that is expected to be obtained from a given web application. As such, it is generally advantageous to have a large number of training samples, which can help to train a model more effectively and to reduce the effects of overfitting, which is characterized by a model that it distinguishes noise or other insignificant differences between training samples and has poor predictive performance outside of the training sample set. However, the task of manually labeling large numbers of samples can be extremely time consuming and tedious for the experts. Worse still is the fact that this labeling is subjective, and an expert who is intimately familiar with the product may produce inconsistent labels due to the nature of the task. Furthermore, inconsistencies may arise between different expert raters. These difficulties are amplified as the size the training set grows and are compounded by the fact that the same web material may be applied in different end-uses, which have different acceptance tolerances. As a result, a product that might be deemed unacceptable for one end-use could be acceptable for another end-use.