1. Technical Field of the Invention
The present invention relates to a method and apparatus for detecting and classifying patterns and, amongst other things to a method and apparatus which utilizes multi-dimensional wavelet neural networks to detect and classify patterns.
2. Background Art
Current trends in industrial and manufacturing automation have placed an increased emphasis on the need for quality and reliability, both in the process control and product characterization areas. As the technologies are becoming more complicated, the production of virtually defect free products by reliable processes is becoming vital. Automatic control systems are becoming more complex as they are called upon to regulate critical dynamic systems and the associated control algorithms and control actuators entail a greater degree of sophistication. Consequently, there is a growing demand for fault tolerance, which can be achieved by improving the Fault Detection and Identification (FDI) concepts. FDI is of interest in a wide variety of applications such as control systems, image analysis, analysis of radar signals, smart sensors, texture analysis, medicine, industry, etc.
FDI algorithms generally consist of two portions, a detection portion and an classification portion. Detection is the process of deciding whether any one of a number of anticipated events, e.g. faults or defects, has occurred. Once the presence of an anticipated event has been established, classification distinguishes which particular anticipated event, e.g. defect, has occurred. There are a number of systems where traditional FDI techniques are not applicable due to the unavailability of analytic models. FDI becomes more difficult when there is a large variation in signal time constants. A high degree of system interdependencies, process and measurement noise, large-grain uncertainty and randomness make detection of anticipated events even more challenging.
Analysis of signals in either the time or frequency domain generally is not sufficient to capture faults that occur over a wide band of frequencies. Analysis of faults in pattern recognition applications should be localized in both the time and frequency domains for each input signal.
Over the last two decades, basic research in FDI has gained increased attention, mainly due to trends in automation, the need to address complex tasks, and the corresponding demand for higher availability and security of the control systems. However, a strong impetus has also come from the side of modem control theory that has brought forth powerful techniques in mathematical modeling, state estimation and parameter identification.
In general, FDI schemes can be classified broadly as: (1) model based FDI techniques; and (2) knowledge based FDI techniques. Model based techniques (analytic) generally use information about state variables from the model of the system to predict the future values. A disparity between the actual values and the predicted values suggests a possible fault. This is a very robust approach to FDI for systems where accurate models are available. However this approach has difficulty where accurate or complete models of the system are unavailable.
Model-based FDI techniques have been thoroughly tested and verified to perform satisfactorily in many applications. Based upon the methods of using the model, various approaches have been developed. For example, innovation-based techniques, such as Generalized Likelihood Ratio, are used for linear stochastic systems. This technique requires N+1 hypothesis testing: Hi for the occurrence of fault i, i=1, . . . , N, and Ho for no failure. The failure decision is based upon the maximum likelihood ratio of the conditional probabilities for Hi and Ho. A technique known as the Failure Sensitive Filters technique employs a class of filters wherein the primary criterion for the choice of the filter is that the effects of certain faults are accentuated in the filter residue. However, it is not always possible to design a filter that is sensitive only to a particular fault. Furthermore, a performance trade off is inherent in this method. For, as the sensitivity of the filter to new data is increased, by effectively increasing the bandwidth of the filter, the system becomes more sensitive to sensor noise and the performance of the detection algorithm in no-failure conditions degrades.
Another technique known as the Multiple Hypothesis Filter Detectors technique uses a bank of filters (one for each fault mode) and each filter is used to calculate the conditional probability that each failure mode has occurred. This technique generally is not very popular due to its level of complexity, which increases exponentially as the system expands. Since the complexity of the technique increases the processing time required, the processing time also increases exponentially with the complexity of the technique.
The Parity Space Approach exploits the inconsistency of data (due to failure) coming from different sources of the system. The Direct Redundancy or Hardware Redundancy technique uses the instantaneous values of different sensors while the Temporal Redundancy technique uses a dynamic relationship between sensor outputs and actuator inputs over a period of time. The Hardware Redundancy technique is simple and easy to apply. However, it requires multiple sensors for each variable. Another drawback of this technique is that it works on the assumption that only one sensor fails at a time (in a three sensor arrangement). Analytic Redundancy uses data from sensors representing different parameters of the system that can be mathematically related by the model or part of the model.
With the availability of mathematical and computational tools, the trend in FDI research has shifted toward analytical (i.e., functional) rather than physical redundancy. This implies that the inherent redundancy contained in the dynamic relationships among the system inputs and measured outputs is exploited for FDI. In such approaches, one makes use of a mathematical model of the system or models describing certain modules of the overall system.
The known techniques described above utilize a model of the system (or part of the system) for fault analysis. These techniques work satisfactorily as long as the model characteristics approximate the actual system. However, their performance degrades rapidly if the model does not accurately represent the actual system. Unfortunately, accurate models are not available for most systems. There is a growing potential for using knowledge-based models and algorithms instead of analytic ones. This approach is, of course, the only one available in cases where analytic models are not available. A comparison of a model-based technique and a knowledge-based technique is shown in FIG. 1. It can be seen in FIG. 1 that the knowledge base replaces the model in the overall architecture. This knowledge-based approach has created a new dimension of possible fault diagnosis techniques for complex processes with incomplete process knowledge. Whereas the analytic methods use quantitative analytical models, the expert systems approach makes use of qualitative models based on the available knowledge of the system. Although the intelligent FDI techniques do not require an accurate analytic model, they are restricted to identification of only predetermined defects. This is, however, acceptable in many cases as the fault modes in many applications are already known.
From the perspective of product characterization, one aspect of quality is perceived as a defect-free final product. Product inspection and defect classification is one of the key issues in the manufacturing arena, where defect classification is a pattern recognition problem. Manual inspection or traditional signal processing have proven to be inadequate in many applications. This is due to the presence of a high degree of uncertainty and complexity in these systems. Intelligent processing tools like fuzzy logic, neural networks and intelligent optimization techniques are currently being used which accommodate large grain uncertainty while utilizing all the information about the system when the information from analytic models of the system is not adequate. This gives intelligent FDI schemes an advantage over conventional FDI techniques, which rely primarily on analytic models. However, heretofore intelligent FDI systems have analyzed signals in either the time or frequency domain exclusively. Due to the wide range of time constants, analysis in the frequency domain alone would mask the sudden bursts of high frequency signals. Further, unless the frequency domain resolution is very fine, slowly varying fault features can be masked in a signal akin to a DC bias. Likewise, analysis in the time domain would not reflect the periodicity of the features. Hence, analysis only in either the frequency or time domain generally is not sufficient to capture features that are spread over a wide band of frequencies.
Most of the intelligent techniques being used today employ a learning mechanism (on-line or off-line) which uses information obtained from an expert, historical data, extrinsic conditions, etc. The learning procedure, in most cases, is cast as an optimization problem which adjusts the parameters of the detection algorithm, modifies the knowledge-base, initiates mode switching, etc. For example, it is known to use learning to determine the optimum weights for aggregation of information from different sources for vibration monitoring. Neural-net based FDI techniques are known which use learning to adjust the weights of individual neurons. Fuzzy Associative Memories (FAMs) are known which employ learning to design an inferencing hypercube.
The fault identification is the classification of faults into different categories. It may be viewed as a mapping from a feature space to a decision space. One well known fuzzy classification routine is the Fuzzy C-Means (FCM) algorithm derived from its crisp version called ISODATA. Consider the partitioning of the set X={x1, x2, . . . , xn} into c-partitions, cxcex5N. FCM assigns a degree of association xcexcik of the kth feature with the ith partition (fault mode in our case). For the cluster center vi of the ith cluster, FCM estimates xcexcik as follows                               min          ⁢                      xe2x80x83                    ⁢          z                =                              ∑                          i              =              1                        c                    ⁢                      xe2x80x83                    ⁢                                    ∑                              k                =                1                            n                        ⁢                                                            (                                      μ                                          i                      ⁢                                              xe2x80x83                                            ⁢                      k                                                        )                                m                            ⁢                              "LeftDoubleBracketingBar"                                                      X                    k                                    -                                      V                    i                                                  "RightDoubleBracketingBar"                                                                        Equation        ⁢                  xe2x80x83                ⁢        1            
These types of approaches work on the assumption that the fuzzy classes are fully understood by the user and that there exists sufficient knowledge of the associated features. They do not allow the classes to be self generated or evolved over time. Hence, they lack the element of learning that would enable the system to work independently without user assistance.
The defect detection problem is in fact a problem of classifying features of the signal representative of characteristics of the product into different categories. It may be viewed as a mapping from the feature space to a decision space where detection and classification can occur. Further, similarity measures combining vague features with known patterns have been used for classification. These approaches work on the assumption that the fuzzy classes are fully understood by the user and there exists sufficient knowledge of the associated features. They do not allow the classes to be self-generated or evolving over time. Hence, they lack the element of learning that would enable the system to work independently without user assistance.
A multi-level architecture for feature classification based on fuzzy logic has been utilized as one approach. Other popular methods for classification use a fuzzy rule-base, fuzzy decision hypercube, fuzzy relational matrix, and fuzzy associative memories (FAM). All these techniques rely upon the user to provide the expert knowledge for the inference engine, which is somewhat problematic, as the defect in a single class will vary in and of themselves. Additionally, the generation of a fuzzy decision hypercube or FAM is not very simple for most in industrial applications.
Many intelligent techniques employ a learning mechanism (unsupervised or supervised) which uses information from an expert, historical data, extrinsic conditions, etc. The learning procedure, in most cases, is cast as an optimization problem which adjusts the parameters of the detection algorithm, modifies the knowledge-base, initiates mode switching, etc. One approach uses learning to determine the optimum weights for aggregation of information from different sources for vibration monitoring. Neural net based FDI techniques use learning to adjust the weights of individual neurons while Fuzzy Associative Memories employ learning to design the inferencing hypercube.
Feature analysis is used for detection and classification of operating modes of the system under observation. Possible operating modes may include, stable condition, subnormal operation, or failure modes. The task of a feature analysis algorithm is to differentiate between a system failure and a functional failure. A system failure is a degradation of performance of the hardware of the system while a functional failure refers to a condition of the system state variables resulting in an unwanted operating mode such as instability. Many functional failures may eventually lead to a system failure.
Product characterization is another very important application area of feature analysis algorithms. This application domain includes product quality inspection, texture classification, signal and image classification, and similar applications.
Traditionally, model-based techniques have been used for feature extraction. These techniques rely solely on an accurate model of the system. Failure sensitive filters and multiple hypotheses filter detectors aim at classifying abnormal system behavior using system models. Model-based techniques perform satisfactorily as long as the model characteristics are close to the actual system. However, performance degrades"" rapidly if the model does not closely represent the actual system. Unfortunately, accurate models are not available for most systems. Another approach utilizes knowledge-based models instead of analytic ones. Knowledge based feature extraction systems have the capability of including a wider range of information sources as input-output data, heuristics, and other iterative methodologies.
With the availability of powerful computing platforms, feature processing has become an important part of many applications utilizing intelligent processing tools like fuzzy logic and neural networks. The terms xe2x80x9cfailurexe2x80x9d, xe2x80x9cfaultxe2x80x9d and xe2x80x9cdefectxe2x80x9d are employed to designate an abnormal system state and are context dependent, the term xe2x80x9cfailurexe2x80x9d suggests a generic condition whereas xe2x80x9cfaultxe2x80x9d and xe2x80x9cdefectxe2x80x9d are used to signify an off normal condition of a dynamic (sensor, actuator, etc.) and a static (product characterization) system state, respectively.
Another very important feature in the industrial applicability of FDI systems is that of computational overhead, or more processing speed. That is, the greater the processing overhead required, the slower the speed of the operation of the FDI system. In industrial processes, it is the speed of the process that is the benchmark at which the FDI system must function. However, the increase in the computational speed of the FDI should not come at the price of lost accuracy, which would defeat the purpose of the installation of the FDI system.
One of the more promising techniques for FDI systems is the utilization of wavelet neural networks. A neural network is composed of multiple layers of interconnected nodes with an activation function in each node and weights on the edges or arcs connecting the nodes of the network. The output of each node is a nonlinear function of all its inputs and the network represents an expansion of the unknown nonlinear relationship between inputs, x, and outputs, F (or y), into a space spanned by the functions represented by the activation functions of the network""s nodes. Learning is viewed as synthesizing an approximation of a multidimensional function, over a space spanned by the activation functions xcfx86(x), i=1, 2, . . . , m, i.e.                               F          ⁡                      (            x            )                          =                              ∑                          i              =              1                        m                    ⁢                                    c              i                        ⁢                                          φ                i                            ⁡                              (                x                )                                                                        Equation        ⁢                  xe2x80x83                ⁢        2            
The approximation error is minimized by adjusting the activation function and network parameters using empirical (experimental) data. Two types of activation functions are commonly used: global and local. Global activation functions are active over a large range of input values and provide a global approximation to the empirical data. Local activation functions are active only in the immediate vicinity of the given input value. Typical global activation functions, the linear threshold and the sigmoid function, are shown in FIGS. 2a and 2b. The Gaussian for radial basis function networks is a typical example of a local activation function is shown in FIG. 2c. The functions which can be computed by a Back Propagation Network (BPN) with one hidden layer having m nodes constitute the set Sm defined by:                               S          m                ≡                  {                                                                      f                  ⁡                                      (                    x                    )                                                  :                                  f                  ⁡                                      (                    x                    )                                                              =                                                ∑                                      i                    =                    1                                    m                                ⁢                                                      c                    i                                    ⁢                                      φ                    ⁡                                          (                                                                        x                          ⁢                                                      xe2x80x83                                                    ⁢                                                      w                            i                                                                          +                                                  θ                          i                                                                    )                                                                                            ,                                          w                i                            ∈                              R                d                                      ,                          c              i                        ,                                          θ                i                            ∈              R                                }                                    Equation        ⁢                  xe2x80x83                ⁢        3            
where f(x) is the sigmoid function and mi, ci, and xcex81, are adjustable parameters. The activation function in Radial Basis Function Networks (RBFN) is local in character and given, in general, for the ith node by:
xcfx86i(x)=h(∥xxe2x88x92xi∥)xe2x80x83xe2x80x83Equation 4 
If h is Gaussian,                                           φ            i                    ⁡                      (            x            )                          =                                            (                                                -                                      "LeftDoubleBracketingBar"                                          x                      -                                              x                        i                                                              "RightDoubleBracketingBar"                                                                    2                  ⁢                                      σ                    i                    2                                                              )                        ⁢                          xe2x80x83                        ⁢            i            ⁢                          xe2x80x83                        ⁢            f            ⁢                          xe2x80x83                        ⁢            x                    ∈          R                                    Equation        ⁢                  xe2x80x83                ⁢        5                                                                    φ              i                        ⁡                          (              x              )                                =                                                                      |                                      W                    i                                    |                                                  π                                      d                    /                    2                                                              ⁢                              exp                ⁡                                  (                                                            -                                              1                        2                                                              ⁢                                                                  (                                                  x                          -                                                      x                            i                                                                          )                                            T                                        ⁢                                                                  W                        i                        2                                            ⁡                                              (                                                  x                          -                                                      x                            i                                                                          )                                                                              )                                            ⁢                              xe2x80x83                            ⁢              i              ⁢                              xe2x80x83                            ⁢              f              ⁢                              xe2x80x83                            ⁢              x                        ∈            R                          ⁢                  xe2x80x83                                    Eq        ⁢                  xe2x80x83                ⁢        u        ⁢                  xe2x80x83                ⁢        a        ⁢                  xe2x80x83                ⁢        t        ⁢                  xe2x80x83                ⁢        i        ⁢                  xe2x80x83                ⁢        o        ⁢                  xe2x80x83                ⁢        n        ⁢                  xe2x80x83                ⁢        6            
where xcex4i is the standard deviation for the one-dimensional case and Wi the dxd weight matrix formed by reciprocals of the covariance of the d-dimensional case. Adaptation and learning with global approximations is a slow process since each network node influences the output over a large range of input values and all activation functions overlap over a large range of input values, thus interacting with each other. Convergence of BPNs is not guaranteed due to the nonlinear nature of the optimization problem. Moreover, global approximation networks provide a value for the output over the whole range of input values independently of the availability or density of training data in given ranges of input values. Such a property could lead to large extrapolation errors without warning. RBFNs avoid large extrapolation errors, have less convergence problems than BPNs, and are trained faster and adapt easily to new data since they require changes in only a small part of the net.
It is well known that functions can be represented as a weighted sum of orthogonal basis functions. Such expansions can be easily represented as neural nets by having the selected basis functions as activation functions in each node, and the coefficients of the expansion as the weights on each output edge. Several classical orthogonal functions, such as sinusoids and Walsh functions for example, are global approximations and suffer, therefore, from the disadvantages of approximation using global functions, i.e. potentially large extrapolation errors. What is needed is a set of basis functions that are local and orthogonal. A special class of functions, known as wavelets, possess good localization properties while also being simple orthonormal bases. Thus, they may be employed as the activation functions of a neural network known as the Wavelet Neural Network (WNN). WNNs possess a unique attribute, in addition to forming an orthogonal or quasi-orthogonal basis they are also capable of explicitly representing the behavior of a function at various resolutions of input variables.
Neural network design has been traditionally plagued by problems of arbitrariness, e.g. the number of nodes and hidden layers. The design of neural nets can be systematized and the arbitrariness may be removed by using activation functions that are naturally orthogonal and have local receptive fields. Thus, if the properties, the training of a neural network could be completely localized, while the number of hidden nodes would be directly determined by the added accuracy offered by a new node. This can be seen by considering a function F(x) which is assumed to be continuous in the range [0, 1]. Let xcfx86; (x), i=1,2, . . . , ∞ be an orthonormal set of continuous functions in [0, 1]. Then, F(x) possesses a unique L2 approximation of the form:                               F          ⁡                      (                          C              ,              x                        )                          =                              ∑                          i              =              1                        n                    ⁢                                    c              k                        ⁢                                          φ                k                            ⁡                              (                x                )                                                                        Equation        ⁢                  xe2x80x83                ⁢        7            
where the elements of the vector of coefficients C=[C1, C2, . . . , CN]T are given by the projection of F(x) onto each basis function, that is                               c          k                =                              ∫            0            1                    ⁢                                    F              ⁡                              (                x                )                                      ⁢                          φ              ⁡                              (                x                )                                      ⁢                          ⅆ              x                                                          Equation        ⁢                  xe2x80x83                ⁢        8            
A reasonable performance (interpolation) metric is the mean-squares error, i.e.                               e          k          2                =                                            ∫              0              1                        ⁢                                          [                                                      F                    ⁡                                          (                      x                      )                                                        -                                                            ∑                                              i                        =                        1                                            K                                        ⁢                                                                  c                        k                                            ⁢                                                                        φ                          k                                                ⁡                                                  (                          x                          )                                                                                                                    ]                            ⁢                              xe2x80x83                            ⁢                              ⅆ                x                                              =                                    ∑                              k                =                                  K                  +                  1                                            ∞                        ⁢                          c              k              2                                                          Equation        ⁢                  xe2x80x83                ⁢        9            
As the mean-squared error decreases, by increasing the number of terms K, the approximation improves. Furthermore, the larger the value of the coefficient, Ck the greater the contribution of the corresponding basis function xcfx86k(x), in the approximating function. This observation provides a formal criterion for picking the most important activation function in each hidden unit of a network.
In addition to xe2x80x9cgoodxe2x80x9d neural net design approaches, another important ingredient in the approximation problem is the multiresolution property. Consider, for example, the case of training data that are not uniformly distributed in the input space, i.e., data are sparse in some regions and dense in others. Approximating such data at a single coarse resolution may not bring out the fine details. A single fine resolution brings out the details, but no general picture may emerge. This tradeoff between the ability to capture fine detail and good generalization may be solved by learning at multiple resolutions. A higher resolution of the input space may be used if data are dense and lower resolution where they are sparse.
A function F(x) may be expressed by its multiresolution components at L scales by                                           F            L                    ⁡                      (            x            )                          =                              ∑                          m              =              1                        L                    ⁢                                    f              m                        ⁡                          (              x              )                                                          Equation        ⁢                  xe2x80x83                ⁢        10            
where, the component at the m-th scale, fm(x), is given by                                           f            m                    ⁡                      (            x            )                          =                              ∑                          k              =              1                        K                    ⁢                                    c                              m                ⁢                                  xe2x80x83                                ⁢                k                                      ⁢                                          φ                                  m                  ⁢                                      xe2x80x83                                    ⁢                  k                                            ⁡                              (                x                )                                                                        Equation        ⁢                  xe2x80x83                ⁢        11            
The basis functions xcfx86mk(x) are all defined at scale m. If m=0 defines the lowest scale (finest resolution of input data) and m=L the highest, the neural network is trained to learn the mapping between inputs and output at the coarsest resolution first; then, the network is trained to learn the added detail as one moves from a coarser to a finer level of resolution. The error in the approximation at each resolution is given by                               e          m          2                =                              ∫            0            1                    ⁢                                                    [                                                                            f                      m                                        ⁡                                          (                      x                      )                                                        -                                                            ∑                                              k                        =                        1                                            K                                        ⁢                                                                  c                                                  m                          ⁢                                                      xe2x80x83                                                    ⁢                          k                                                                    ⁢                                                                        φ                                                      m                            ⁢                                                          xe2x80x83                                                        ⁢                            k                                                                          ⁡                                                  (                          x                          )                                                                                                                    ]                            2                        ⁢                          ⅆ              x                                                          Equation        ⁢                  xe2x80x83                ⁢        12            
Orthogonal wavelets generate such a multiresolution representation.
A family of wavelets is derived from the translations and dilations of a single function. If "psgr"(x) is the starting (mother) function, to be called a wavelet, the members of the family are given by                                           1                          s                                ⁢                      ψ            ⁡                          (                                                x                  -                  u                                s                            )                                ⁢                      xe2x80x83                    ⁢          for          ⁢                      xe2x80x83                    ⁢                      (                          s              ,              u                        )                          ∈                  R          2                                    Equation        ⁢                  xe2x80x83                ⁢        13            
that is they are indexed by two labels (parameters) s and u, with s indicating the dilation and u the translation of the base wavelet, "psgr"(x). The translation and dilation of the Battle-Lemarie wavelet is shown in FIGS. 3 and 4.
An important factor in the formulation and design of neural networks with wavelets as basis functions, is the multiresolution representation of functions using wavelets. It provides the essential framework for the completely localized and hierarchical training afforded by Wavelet Neural Networks. Consider a continuous, square-integrable function, F(x), with Fm, (x)xe2x89xa1AmF(x) denoting the approximation of F(x) at the resolution m, where 2m is the sampling interval, that is, the interval between two consecutive sampled values used in the approximation. Then, 2xe2x88x92m is the number of sampled values per unit length of input space. Consequently, as m increases, the number of samples per unit length decreases and the approximation Fm (x) becomes coarser. It has been shown that there exists a unique function, xcfx86(x), called a scaling function, such that for all mxcex5Z, the family of functions resulting from the dilation and translation of xcfx86(x), that is:
xcfx86mk(x)={square root over (2xe2x88x92m)}xcfx86(2xe2x88x92mxxe2x88x92k)(m,k)xcex5Z2xe2x80x83xe2x80x83Equation 14 
constitutes an unconditional orthonormal basis. With this basis function, Fm(x) is given by                                                         F              m                        ⁡                          (              x              )                                ≡                                    A              m                        ⁢                          F              ⁡                              (                x                )                                                    =                              ∑                          k              =                              -                ∞                                                    +              ∞                                ⁢                                    a                              m                ⁢                                  xe2x80x83                                ⁢                k                                      ⁢                                          φ                                  m                  ⁢                                      xe2x80x83                                    ⁢                  k                                            ⁡                              (                x                )                                                                        Equation        ⁢                  xe2x80x83                ⁢        15            
and the coefficients amk are projections of F(x) onto the orthonormal basis function, that is,                               a                      m            ⁢                          xe2x80x83                        ⁢            k                          =                              ∫                          -              ∞                        ∞                    ⁢                                    F              ⁡                              (                x                )                                      ⁢                                          φ                                  m                  ⁢                                      xe2x80x83                                    ⁢                  k                                            ⁡                              (                x                )                                      ⁢                          ⅆ              x                                                          Equation        ⁢                  xe2x80x83                ⁢        16            
At various resolutions, any F(x)xcex5L2(R) can be expanded into a set of orthonormal wavelets, that is,                               F          ⁡                      (            x            )                          =                              ∑                          m              =                              -                ∞                                      ∞                    ⁢                                    ∑                              k                =                                  -                  ∞                                            ∞                        ⁢                                          d                                  m                  ⁢                                      xe2x80x83                                    ⁢                  k                                            ⁢                                                ψ                                      m                    ⁢                                          xe2x80x83                                        ⁢                    k                                                  ⁡                                  (                  x                  )                                                                                        Equation        ⁢                  xe2x80x83                ⁢        17            
The above equation is known as the wavelet decomposition of a square-integrable function, and provides the theoretical framework for the design of Wavelet Neural Networks. The coefficients dmk are the projects of F(x) on the basis functions "psgr"mk (x). It can be shown that the approximation of F(x) at scale (mxe2x88x921) is equal to                                           F                          m              -              1                                ⁡                      (            x            )                          =                                            F              m                        ⁡                          (              x              )                                +                                    ∑                              k                =                                  -                  ∞                                            ∞                        ⁢                                          d                                  m                  ⁢                                      xe2x80x83                                    ⁢                  k                                            ⁢                                                ψ                                      m                    ⁢                                          xe2x80x83                                        ⁢                    k                                                  ⁡                                  (                  x                  )                                                                                        Equation        ⁢                  xe2x80x83                ⁢        18            
This last equation summarizes the hierarchical, multiresolution representation of functions offered by the wavelet decomposition.
From a practical perspective, given a sequence of discrete samples of F(x), resulting from physical measurements,                                           F            0                    ⁡                      (            x            )                          =                              ∑                          k              =                              -                ∞                                      ∞                    ⁢                                    a                              0                ⁢                k                                      ⁢                                          φ                                  0                  ⁢                  k                                            ⁡                              (                x                )                                                                        Equation        ⁢                  xe2x80x83                ⁢        19            
the recursive decomposition of the discrete sequence of samples is characterized by                                           A                          m              -              1                                ⁢                      F            ⁡                          (              x              )                                      =                                            ∑                              k                ∈                Z                                            xe2x80x83                                      ⁢                                          a                                  m                  ⁢                                      xe2x80x83                                    ⁢                  k                                            ⁢                              φ                                  m                  ⁢                                      xe2x80x83                                    ⁢                  k                                                              +                                    ∑                              k                ∈                Z                                            xe2x80x83                                      ⁢                                          d                                  m                  ⁢                                      xe2x80x83                                    ⁢                  k                                            ⁢                                                ψ                                      m                    ⁢                                          xe2x80x83                                        ⁢                    k                                                  ⁡                                  (                  x                  )                                                                                        Equation        ⁢                  xe2x80x83                ⁢        20            
with the coefficients of the decomposition given by
am=Hamxe2x88x921dm=Gamxe2x88x921xe2x80x83xe2x80x83Equation 21 
Filters H and G are defined in such a way that the impulse responses are given by                                           h            k                    =                                    ∫                              -                ∞                            ∞                        ⁢                                                            φ                                      0                    ⁢                    k                                                  ⁡                                  (                  x                  )                                            ⁢                                                φ                                      1                    ⁢                    k                                                  ⁡                                  (                  x                  )                                            ⁢                              ⅆ                x                                                    ⁢                  
                ⁢                              g            k                    =                                    ∫                              -                ∞                            ∞                        ⁢                                                            φ                                      0                    ⁢                    k                                                  ⁡                                  (                  x                  )                                            ⁢                                                ψ                                      1                    ⁢                    k                                                  ⁡                                  (                  x                  )                                            ⁢                              ⅆ                x                                                                        Equation        ⁢                  xe2x80x83                ⁢        22            
The developments above are based on infinite length sequences of sampled values. Finite sequences result in xe2x80x9cend effectsxe2x80x9d which may be addressed by considering a mirror image of the trend beyond its end points or by defining appropriate H and G filters.
The principal benefit from the wavelet decomposition is the localized characterization of a continuous or discrete function in the input space, and wave number (or frequency, or scale). The input-frequency localization of wavelets at various translations and dilations is shown in FIG. 5. Each rectangle indicates the input space and scale space localization of the corresponding wavelet. The size of each rectangle is determined by the standard deviation of the wavelet and its Fourier transform. The area of each rectangle is constant, indicating that as the frequency range increases, the input range decreases, as governed by the uncertainty principle. The information contained in the input and frequency range covered by each wavelet or scaling function is captured by the coefficients dmk and ak, respectively. Consider coefficient d2, 23 in the grid of FIG. 6. The value of d2, 33 measures the content of the original signal in terms of the wavelet at the 2-nd dilation, when the input takes on values in the range [33-q, 33+q]. In other words, it measures the content of the original signal in the frequency range corresponding to the frequencies allowed at scale 2, and in the input range [33-q, 33+q]. This range is indicated by the encircled points in the figure. Here q is assumed to be 2 units.
A major challenge for wavelet theorists has been to extend the success they have had on one-dimensional signals to more dimensions. This is especially important for real world defect identification or pattern recognition problems, as the number of different features of the image or signals created from the image that are indicative of a defect or pattern are numerous, and no single feature is generally sufficient to be relied upon to signify the existence of a defect.
In accordance with the present invention, a method and apparatus is provided which analyzes an image of an object to detect and identify defects in the object. The method and apparatus generate a signal representing at least part of the object. Certain features of the signal are extracted and then provided to a multi-dimensional neural network for classification.
In one embodiment the present invention comprises an apparatus for analyzing a 2-D representation of an object. The apparatus comprises at least one sensor disposed to capture a 2-D representation, a memory that stores at least a portion of the 2-D representation; and a processor that derives a signal from the 2-D representation, that generates a plurality of feature values and that provides the feature values to a multi-dimensional wavelet neural network which provides a classification output indicative of whether the representation comprises a predetermined pattern.
In another embodiment the present invention comprises a method for pattern recognition, comprising generating a 2-D digital representation of at least part of an object, extracting feature values from the 2-D digital representation, providing the feature values to a multi-dimensional wavelet neural network; and providing a classification output indicative of a predetermined pattern if the feature values are indicative of a predetermined pattern.
In a further embodiment the invention comprises a computer readable medium containing instructions for a computer comprising means for instructing the computer to read at least a portion of a 2-D digital image, means for instructing the computer to generate a feature vector, means for instructing the computer to provide the feature vector to a multi-dimensional wavelet neural network; and means for instructing the computer to provide a classification output indicative of a predetermined pattern from the multi-dimensional neural network if the feature values are indicative of a predetermined pattern.
In an additional embodiment the present invention comprises an apparatus for pattern recognition. The apparatus comprises an input that receives a 2-D representation of at least part of an object, a memory that stores at least a portion of the 2-D representation; and a processor that generates a plurality of feature values representing features of said at least one signal and that provides the feature values to a perceptron neural network comprising a plurality of neurons each defined by the function "psgr"a,b={square root over (|diag(a)|)}"psgr"(diag(a)(xxe2x88x92b)) where x is a vector comprising said feature values, a is a squashing matrix for the neuron and b is the translation vector for that neuron. The perceptron neural network provides a classification output indicative of whether the representation contains a predetermined pattern.
Accordingly, it is an object of the present invention to provide a robust fault detection and identification system.
It is another object of the present invention to provide a fault detection and identification system which is computationally efficient.
It is yet another object of the present invention to provide a fault detection and identification system which can be incorporated as part of a manufacturing line for real time detection and identification of defects occurring in an object being manufactured and for controlling the manufacturing process to improve the production quality of the object being manufactured.
It is yet another object of the present invention to provide an intelligent fault detection and identification system which can be incorporated into a textile fabric manufacturing process for detecting defects in fabric being manufactured and for controlling the manufacturing process to eliminate or minimize defects in the fabric.
It is yet another object of the present invention to provide a robust fault detection and identification system which is economical.
These and other objects of the present invention are depicted and described in the following description, drawings and claims.