1. Field
This disclosure relates to object recognition search. More particularly, the present disclosure describes feature extraction techniques that are used to facilitate object recognition and search. The present disclosure also describes architectures and systems that are particularly adapted for the recognition and search of moving objects.
2. Description of Related Art
While humans are well-adapted for object recognition, machine or computer-based object recognition is a challenging problem. Several different approaches have been the subject of much investigation. These approaches include: bio-related methods, shape and color features based approaches, and image retrieval based shape and color feature methods. Bio-related methods have been described by Thomas Serre, Lior Wolf, Stanley Bileschi, Maximilian Riesenhuber, and Tomaso Poggio, in “Robust Object Recognition with Cortex-Like Mechanisms”, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 29, No. 3, 411-426, March 2007; by Fukushima, K., in “Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biol. Cybern., 36: 193-202 (1980); by Riesenhuber, M. and Poggio, T., in “Hierarchical models of object recognition in cortex”. Nat. Neurosci., 2: 1019-1025 (1999); and Perrett, D. I. and Oram, M., in “Neurophysiology of shape processing,” Image Vis. Comput., 11: 317-333 (1993). Shape and color features based approaches have been described by B. W. Mel, in “SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object recognition,” Neural Computation, 9, pp. 777-804, 1997 and by Eitan Sharon, Meirav Galun, Dahlia Sharon, Ronen Basri & Achi Brandt, in Nature, Vol 442.17, 810-813, August 2006. Image retrieval based shape and color feature methods have been described by Theo Gevers and Arnold W. M. Smeulders, in “PicToSeek: Combining Color and Shape Invariant Features for Image Retrieval,” IEEE Transactions On Image Processing, Vol. 9, No. 1, pp. 102-119, January 2000, and by A. K. Jain and A. Vailaya, in “Image retrieval using color and shape” Pattern Recognition, vol. 29, pp. 1233-1244, 1996. Since the shape and color features of a given object can uniquely define its characteristics and they also agree with biological models, these approaches are strongly related.
From a physical perspective, an electronic image of an object is a collection of photon reflections from the surface of that object with respect to a fixed camera position. If the object moves and the motion/rotation of the surface of the object are sufficiently small, a new collection of photon reflections may not be correlated with the previous reflections from a pixel-wise view point; however, they still maintain a strong correlation with a global view of the object. To obtain a global view of an object, the shape feature is the logical building block to be used and the color feature, if available, provides additional information. Moreover, for different lighting conditions and different light absorption properties of the various materials in the object, it reflects and absorbs differently, so that the object color provides a unique response. When the object moves in an evolving light environment, the color response will change locally with respect to its previous response.
Shape Feature Extraction
For problems with rich data sources such as image recognition and computer vision, the dimension of the input vector is typically larger than the number of input samples, which leads to overfitting of the data if no care is taken to achieve useful generalization. Furthermore, computing power rapidly becomes insufficient to process the data within a reasonable time. To overcome these two obstacles, a preprocessing step may be effective in the following two ways: (1) non-useful and redundant data are eliminated, thus enhancing the operational processing step; and, (2) the salient feature can be selected, thus improving the recognition capability.
Support vector machines, learning by hints, Principal Component Analysis (PCA) and Cellular Neural Networks (CNN) all reduce the dimension of the input vector set with little or no significant loss of information. Such data extraction methods also have the advantage of eliminating some irrelevant data, such as small amplitude noise, and speeding up the classification step. Sequential PCA is a tool that may require less computation and be more hardware friendly to enable real time feature extraction for real time adaptive capability.
Principal Component Analysis (PCA) is a second order statistical approach, which can been used to extract the features of a data set (see, for example, A. K. Jain, R. P. W. Duin, and J. Mao “Statistical Pattern Recognition: A Review”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, NO. 1, January 2000) or to perform data compression (see, for example, T. A. Duong “Real Time On-Chip Sequential Adaptive Principal Component Analysis for Data Feature Extraction and Image Compression”, GOMAC Tech-03, Vol. I, pp. 460-464, Tampa, Fla., 31 March-3 April, 2003; T. A. Duong and V. A. Duong, “Sequential Principal Component Analysis—An Optimal and Hardware-Implementable Transform for Image Compression”, 3rd IEEE Space Mission Challenges for Information Technology in Pasadena, Calif., 19-23 July, 2009; and S. Bannour and M. R. Azimi-Sadjadi, “Principal Component Extraction Using Recursive Least Squares Learning,” IEEE Trans. On Neural Networks, Vol. 6, No. 2, March 1995). Especially, when the data set is Gaussian, redundant and overwhelmingly large, PCA is a very effective preprocessing step to extract data features for classification and/or to cluster data in the most compact energy vectors for data compression. Unfortunately, PCA requires that the basis vectors be orthogonal, which is typically an artificial assumption.
The PCA procedure is complicated and computationally intensive (O(N3), where N is the dimension of the vector input), thereby making it difficult to use for rich data sources. To get over the hurdles from the traditional PCA technique, simple sequential PCA techniques have been developed (see, for example, E. Oja and J. Karhunen, “On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix.” J. Math. Anal. Appl., vol. 106, pp. 69-84, 1985). These techniques are based on a learning approach that sequentially obtains principal component vectors. Some works in PCA are reported using Hebbian or anti-Hebbian learning (see, for example, S. Haykin, Neural Network—A Comprehensive Foundation. New York: Macmillan, 1994, and P. Baldi and K. Hornik, “Learning in linear neural networks: A survey,” IEEE Trans. Neural Networks, Vol. 6, pp. 837-857, 1995) and gradient-based learning (see, for example, S. Bannour and M. R. Azimi-Sadjadi, “Principal Component Extraction Using Recursive Least Squares Learning,” IEEE Trans. On Neural Networks, Vol. 6, No. 2, March 1995., and L. Xu, “Least mean square error reconstruction principal for self-organizing neural-nets,” Neural Networks, 6, pp. 627-648, 1993) or even the more elegant technique of natural gradient descent (see, for example, S. I. Amari., “Natural gradient works efficiently in learning,” Neural Computation, 1998).
For sequential PCA, the gradient descent (GED) technique may be a more attractive approach for hardware implementation as a straight forward technique compared to others, e.g., steepest decent, conjugate gradient, or Newton's second order method, but it exposes some difficulties in learning convergence when other principal component vectors are corresponding to smaller eigen values. In addition, this technique still requires some complicated hardware.
A Dominant-Element-Based Gradient Descent and Dynamic Initial Learning Rate technique for sequential PCA has been developed. This technique serves two purposes: 1) simplified hardware implementation, especially in VLSI as System-On-A-Chip approach; and 2) fast and reliable convergence as compared with counterpart gradient descent. This technique requires much less computation and its optimized architecture is more suitable for implementing as a real time adaptive learning system in hardware.
The objective function for the Dominant-Element-Based Gradient Descent and Dynamic Initial Learning Rate technique is defined as shown in Eq. 1 below:
                              J          ⁡                      (            w            )                          =                                            ∑                              i                =                1                            m                        ⁢                                          J                i                            ⁡                              (                                  w                  i                                )                                              =                                    ∑                              i                =                1                            m                        ⁢                                          ∑                                  t                  =                  1                                k                            ⁢                                                                                                            x                      t                                        -                                                                  w                        i                                            ⁢                                              w                        i                        T                                            ⁢                                              x                        t                                                                                                              2                                                                        Eq        .                                  ⁢        1            where m is the number of principal components, k is the number of measurement vectors, xt, measured at time t and wi is the ith principal vector (or eigen vector).
From Eq. 1, the additional definitions shown in Eq. 2 and Eq. 3 below can be made:
                                          J            i                    ⁡                      (                          w              i                        )                          =                              ∑                          t              =              1                        k                    ⁢                                                                                    y                  i                  t                                -                                                      w                    i                                    ⁢                                      w                    i                    T                                    ⁢                                      y                    i                    t                                                                                      2                                              Eq        .                                  ⁢        2                                          y          i          t                =                              x            t                    -                                    ∑                              j                =                1                                            i                -                1                                      ⁢                                          w                j                            ⁢                              w                j                T                            ⁢                              x                t                                                                        Eq        .                                  ⁢        3            
From Eq. 2 and Eq. 3, the learning algorithm can be processed sequentially for each principal vector that is based on the gradient descent as shown in Eq. 4 below:
                              Δ          ⁢                                          ⁢                      w                          i              ⁢                                                          ⁢              j                                      =                              -                                          ∂                                  J                  i                                                            ∂                                  w                                      i                    ⁢                                                                                  ⁢                    j                                                                                =                      -                                          ∂                                  (                                                                                                                                    y                          i                          t                                                -                                                                              w                            i                                                    ⁢                                                      w                            i                            T                                                    ⁢                                                      y                            i                            t                                                                                                                                      2                                    )                                                            ∂                                  w                                      i                    ⁢                                                                                  ⁢                    j                                                                                                          Eq        .                                  ⁢        4            
From Eq. 4, only the dominant element (see T. A. Duong “Real Time On-Chip Sequential Adaptive Principal Component Analysis for Data Feature Extraction and Image Compression”, GOMAC Tech-03, Vol. I, pp. 460-464, Tampa, Fla., 31 Mar.-3 Apr., 2003) is used; the weight update can be obtained as shown in Eq. 5 below:
                                          w                          i              ⁢                                                          ⁢              j                        new                    =                                                    w                                  i                  ⁢                                                                          ⁢                  j                                old                            +                              ζΔ                ⁢                                                                  ⁢                                  w                                      i                    ⁢                                                                                  ⁢                    j                                                                        =                                          w                                  i                  ⁢                                                                          ⁢                  j                                old                            +                                                ζɛ                                      i                    ⁢                                                                                  ⁢                    j                                                  ⁡                                  (                                                                                    w                        i                        T                                            ⁢                                              y                        i                        t                                                              +                                                                  w                                                  i                          ⁢                                                                                                          ⁢                          j                                                                    ⁢                                              y                                                  i                          ⁢                                                                                                          ⁢                          j                                                t                                                                              )                                                                    ⁢                                  ⁢                                  ⁢                              where            ⁢                                                  ⁢            ζ                    =                                                                      E                  0                                                  E                                      i                    -                    1                                                              ⁢                                                          ⁢              and              ⁢                                                          ⁢                                                y                  ^                                i                t                                      =                                          w                i                            ⁢                              w                i                T                            ⁢                                                y                  i                  t                                .                                                                        Eq        .                                  ⁢        5            E0 is the initial energy when the network starts learning and Ei−1 is the energy of the (i−1)th extracted principal.
The techniques described above may be used in the extraction of shape features for object recognition. Application of these techniques to embodiments of the present invention is described below in the Detailed Description section.
Color Feature Extraction
To use the object color feature for helping in object recognition in a friendly and correlated environment, color segmentation is a suitable approach to narrow down the search space. Several color segmentation algorithms have been proposed in literature (see, for example, M. Celenk. “A Color Clustering Technique for Image Segmentation.” Computer Vision Graphics Image Process. Graphical Models Image Process. 52. pp. 145-170, 1990; J. Lui and Y. H. Yang, “Multiresolution Color Image Segmentation,” IEEE Trans. Patt. Anal. Mach. Intel. 16, 689-700, 1994; E. Littman and H. Ritter. “Adaptive Color Segmentation—A Comparison of Neural and Statistical Methods.” IEEE Trans. Neural Net. Vol. 8, No. 1, pp. 175-185, 1997; F. Perez and C. Kock, “Toward Color Image Segmentation in Analog VLSI: Algorithm and Hardware,” Int. J. Comp. Vision 12:1, 17-24, 1994; G. Healy. “Segmenting Images Using Normalized Color.” IEEE Trans. Syst. Man. Cyber. 22. 1. pp. 64-73, 1992; H. Okii, et al. “Automatic color segmentation method using a neural network model for stained images,” IEICE Trans. Inf. Syst. (Japan) Vol. E770D No. 3, pp. 343-350, March 1994; T. Nakamura and T. Ogasawara, “On-Line Visual Learning Method for Color Image Segmentation and Object Tracking,” Proc. of The 1999 IEEE/RSJ Intelligent Robots and Systems, pp. 222-228, 1999; P. H. Batavia and S. Singh, “Obstacle Detection Using Adaptive Color Segmentation and Color Stereo Homography,” Proc. Of the 2001 IEEE International Conference on Robotic and Automation, Seoul, Korea, May 21-26, 2001; E. Fiesler, S. Campbell, L. Kempen, and T. Duong. “Color Sensor and Neural Processor on One Chip.” International Symposium on Optical Science, Engineering, and Instrumentation, Proc. of the SPIE, vol. 3455 ‘Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation’, pp. 214-221, 1998; T. A. Duong, “Real Time Adaptive Color Segmentation for Mars Landing Site Identification,” Journal of Advanced Computational Intelligence and Intelligent Informatics (Japan), pp. 289-293, Vol. 7 No. 3, 2003).
The majority of existing color segmentation techniques are based on Red-Green-Blue (RGB) classification in combination with complex data processing. This processing includes spatial clustering to separate targets from the background, multi-histogram analysis (see, for example, M. Celenk. “A Color Clustering Technique for Image Segmentation.” Computer Vision Graphics Image Process. Graphical Models Image Process. 52. pp. 145-170, 1990), Bayesian methods (see, for example, J. Lui and Y. H. Yang, “Multiresolution Color Image Segmentation,” IEEE Trans. Patt. Anal. Mach. Intel. 16, 689-700, 1994), and various neural network approaches (see, for example, E. Littman and H. Ritter. “Adaptive Color Segmentation—A Comparison of Neural and Statistical Methods.” IEEE Trans. Neural Net. Vol. 8, No. 1, pp. 175-185, 1997). Algorithms have been proposed based on edge detection in color space (see, for example, F. Perez and C. Kock, “Toward Color Image Segmentation in Analog VLSI: Algorithm and Hardware,” Int. J. Comp. Vision 12:1, 17-24, 1994) and normalized color space (see, for example, H. Okii, et al. “Automatic color segmentation method using a neural network model for stained images,” IEICE Trans. Inf. Syst. (Japan) Vol. E770D No. 3, pp. 343-350, March 1994).
In real time applications, algorithms with fast learning and adaptive capabilities are preferred. As described in T. A. Duong and Allen R. Stubberud, “Convergence Analysis Of Cascade Error Projection—An Efficient Learning Algorithm For Hardware Implementation”, International Journal of Neural System, Vol. 10, No. 3, pp. 199-210, June 2000, Cascade Error Projection (CEP) algorithm provides an excellent tool in fast and simple learning.
The CEP neural network architecture is illustrated in FIG. 1. Shaded squares 102 and circles 103 indicate frozen weights; a non-shaded square 112 indicates calculated weights, and a non-shaded circle 113 indicates learned weights. As can be seen in FIG. 1, the shaded squares 102 and shaded circles 103 indicate the learned or calculated weight set that is computed and frozen. A non-shaded circle 113 indicates that perceptron learning is applied to obtain the weight set and a non-shaded square 112 indicates that the weight set is deterministically calculated.
In the CEP algorithm, the energy function is defined as shown in Eq. 6 below:
                              E          ⁡                      (                          n              +              1                        )                          =                              ∑                          p              =              1                        P                    ⁢                                    {                                                                    f                    h                    p                                    ⁡                                      (                                          n                      +                      1                                        )                                                  -                                                      1                    m                                    ⁢                                                            ∑                                              o                        =                        1                                            m                                        ⁢                                          (                                                                        t                          o                          p                                                -                                                  o                          o                          p                                                                    )                                                                                  }                        2                                              Eq        .                                  ⁢        6            
The weight update between the inputs (including previously added hidden units) and the newly added hidden unit is calculated as shown in Eq. 7 below:
                              Δ          ⁢                                          ⁢                                    w                              i                ⁢                                                                  ⁢                h                            p                        ⁡                          (                              n                +                1                            )                                      =                              -            η                    ⁢                                    ∂                              E                ⁡                                  (                                      n                    +                    1                                    )                                                                    ∂                                                w                                      i                    ⁢                                                                                  ⁢                    h                                    p                                ⁡                                  (                                      n                    +                    1                                    )                                                                                        Eq        .                                  ⁢        7            and the weight update between hidden unit n+1 and the output unit o is as shown in Eq. 8 below:
                                          w                          h              ⁢                                                          ⁢              o                                ⁡                      (                          n              +              1                        )                          =                                                                              ∑                                      p                    =                    1                                    P                                ⁢                                                      ɛ                    o                    p                                    ⁢                                      f                    o                                          ′                      ⁢                                                                                          ⁢                      p                                                        ⁢                                                            f                      h                      p                                        ⁡                                          (                                              n                        +                        1                                            )                                                                                                                    ∑                                      p                    =                    1                                    P                                ⁢                                                      [                                                                  f                        o                                                  ′                          ⁢                                                                                                          ⁢                          p                                                                    ⁢                                                                        f                          h                          p                                                ⁡                                                  (                                                      n                            +                            1                                                    )                                                                                      ]                                    2                                                      ⁢                                                  ⁢            with            ⁢                                                  ⁢                          f              ⁡                              (                x                )                                              =                                                    1                -                                  ⅇ                                      -                    x                                                                              1                +                                  ⅇ                                      -                    x                                                                        .                                              Eq        .                                  ⁢        8            
where m is the number of outputs and P is the number of training patterns. Error εop=top−oop(n); where oop(n) is the output element o of the actual output o(n) for training pattern p, and top is the target element o for training pattern p. n indicates the number of previously added hidden units. ƒ′op(n)=ƒ′op denotes the output transfer function derivative with respect to its input. ƒhp(n+1) denotes the transfer function of hidden unit n+1.
The CEP algorithm is processed in two steps: (1) Single Perceptron learning which is governed by Eq. 7 to update the weight vector Wih(n+1); and (2) when the single Perceptron learning is completed, the weight set Who(n+1) can be obtained by the calculation governed by Eq. 8. Additional details of the CEP algorithm and the convergence analysis may be found in T. A. Duong and Allen R. Stubberud, “Convergence Analysis Of Cascade Error Projection—An Efficient Learning Algorithm For Hardware Implementation”, International Journal of Neural System, Vol. 10, No. 3, pp. 199-210, June 2000.
While techniques are known in the art for object detection and recognition based on either shape feature extraction and detection or color feature extraction and detection, the usefulness of these techniques for use in a heterogeneous environment may be somewhat limited. Hence, there is a need in the art for techniques that will support effective recognition in more widely varying environments.