In recent years, real time object recognition in a dynamic environment is a challenging and important task that has been approached by many researchers in an attempt to provide solutions for the practical real world applications, e.g., NASA precision landing of spacecraft and many identification applications for Home Land Security.
The term Artificial Intelligence is usually associated with expert systems, which use a set of logical rules to classify data. Unfortunately, such rule-based systems break down if two rules are contradictory or if a set of rules creates an infinite recursion that idles the system. To fix this problem, systems were developed that use soft rules by not only computing answers such as “one” for “yes” and “zero” for “no”, but also numbers between one and zero for “perhaps.” Popular soft classifiers are artificial neural networks, fuzzy systems, Bayesian networks, and support vector machines.
For problems with rich data sources such as image recognition, computer vision, and speech recognition, the dimension of the input vector is typically larger than the number of input samples, which leads to overfitting of the data if no care is taken to achieve useful generalization. Furthermore, the computing power becomes rapidly insufficient to process the data within reasonable time. To avoid these problems, the data is typically preprocessed to get rid of redundant information. Support vector machines [1], learning by hints [2], Principal Component Analysis (PCA) [3] or Cellular Neural Networks (CNN) [4] reduce the dimension of the input vector set. Such data extraction has also the advantage to eliminate some irrelevant data, such as small amplitude noise, and to speed up the classification step. In Section B.1, we will focus on the friendly hardware algorithm, which has been developed at JPL to compute PCA in real time by using specialized parallel distributed hardware.
PCA has proven its preprocessing capabilities in various successful applications such as data compression, feature extraction for recognition, face recognition, source separation etc.
Data extraction methods and classifiers do not address the problems that are typical for time series. Time series are particularly hard to analyze since events that are reflected in past data may cause effects in current data. Furthermore, an agent has to choose an action without having access to future data.
Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a second order statistical approach, which has been used to extract the features of a data set or to perform data compression. Specially, when data set is Gaussian, redundant and overwhelmingly large, PCA is a very effective preprocessing step to extract data features for classification and/or to cluster data in the most compact energy vectors for data compression. Unfortunately, PCA requires that the basis vectors are orthogonal, which is typically an artificial assumption. To obtain the principal component vectors, traditionally the covariance or correlation matrix is calculated, then eigen values are obtained, and corresponding to each eigen value, a component (eigen) vector is found. This procedure is complicated and computationally intensive (O(N3) where N is dimension of the vector input), thus making it impracticable to use for rich data sources. Certain applications, such as classification, navigation, and tracking, only require a few principal component vectors, which is often a good compromise between computing speed, compact and/or low power device. In this case, the traditional technique for computing PCA becomes unnecessarily costly. Moreover, hardware implementations of the traditional PCA algorithm become even more challenging. To get over the hurdles from the traditional PCA technique, simple sequential PCA techniques have been introduced (see E. Oja and J. Karhunen, “On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix.” J. Math. Anal. Appl., vol. 106, pp. 69-84, 1985). These techniques are based on a learning approach that sequentially obtains principal component vectors. Some works in PCA are reported using Hebbian or anti-Hebbian learning and gradient-based learning or even fancier technique of natural gradient descent. Most of the work on PCA is using software-based approaches wherein the computation power is mainly based on the microprocessor, DSP, or FPGA. These techniques are power hungry and hard-to-miniaturized the system in the chip level. For VLSI hardware implementation approach, there are some PCA works reported by R. Konishi et al. “PCA-1: a fully asynchronous, self-reconfigurable LSI,” Seventh International Symposium on Asynchronous Circuits and Systems, 2001 (ASYNC 2001, pp. 54-61, 11-14 Mar. 2001 and H. Ito et al. “Dynamically reconfigurable logic LSI-PCA-1,” Digest of Technical Papers of 2001 Symposium on VLSI Circuits, pp. 103-106, 14-16, 2001. For sequential PCA, the gradient descent (GED) technique is a more attractive approach for hardware implementation as a straight forward technique compared to others e.g. steepest decent, conjugate gradient, or Newton's second order method, but it exposes some difficulties in learning convergence when other principal component vectors are corresponding to smaller eigen values. In addition, this technique still requires some complicated hardware.
At Jet Propulsion Laboratories (JPL) in Pasadena, Calif., an optimal PCA learning technique has been developed by the present inventors which serves two purposes: 1) it simplified the required hardware implementation, specially in VLSI as System-On-A-Chip approach; 2) provided a fast and reliable convergence as compared with counterpart gradient descent (it is also a hardware-friendly approach). Furthermore, this new technique is theoretically proven to converge in the same attractor qualitatively as gradient-based technique, and it requires much less computation and optimized architecture to be more suitable for hardware implementation as a real time adaptive learning system.
We adapt the objective function below:
                              J          ⁡                      (            w            )                          =                                            ∑                              i                =                1                            m                        ⁢                                                  ⁢                                          J                i                            ⁡                              (                                  w                  i                                )                                              =                                    ∑                              i                =                1                            m                        ⁢                                                  ⁢                                          ∑                                  t                  =                  1                                k                            ⁢                                                          ⁢                                                                                                            x                      t                                        -                                                                  w                        i                                            ⁢                                              w                        i                        T                                            ⁢                                              x                        i                                                                                                              2                                                                        (        1        )            
where m is the number of principal components and k is the number of measurement vectors. tx is a measured vector at time t and wi is the ith principal vector (or eigen vector).
With
                                          J            i                    ⁡                      (                          w              i                        )                          =                                            ∑                              t                =                1                            k                        ⁢                                                  ⁢                                                                                                                        y                      i                      t                                        -                                                                  w                        i                                            ⁢                                              w                        i                        T                                            ⁢                                              y                        i                        t                                                                                                              2                            ⁢                                                          ⁢              And              ⁢                                                          ⁢                              y                i                t                                              =                                    x              t                        -                                          ∑                                  j                  =                  1                                                  i                  -                  1                                            ⁢                                                          ⁢                                                w                  j                                ⁢                                  w                  j                  T                                ⁢                                  x                  t                                                                                        (        2        )            
a) PCA Learning Approach
From equation (2), the learning algorithm can be processed sequentially for each principal vector that is based on the gradient descent as follows:
                              Δ          ⁢                                          ⁢                      w            ij                          =                              -                                          ∂                                  J                  i                                                            ∂                                  w                  ij                                                              =                      -                                          ∂                                  (                                                                                                                                    y                          i                          t                                                -                                                                              w                            i                                                    ⁢                                                      w                            i                            T                                                    ⁢                                                      y                            i                            t                                                                                                                                      2                                    )                                                            ∂                                  w                  ij                                                                                        (        3        )            
From equation (3), only the dominant element (see T. A. Duong, V. A. Duong, “A New Learning Technique of Sequential Adaptive Principal Component Analysis for Fast Convergence and Simplified Hardware Implementation” submitted to IEEE. Trans. On. Neural Networks, currently unpublished—a copy of same is enclosed as Appendix A) is used; the weight update can be obtained as follows:
                                          w            ij            new                    =                                                    w                ij                old                            +                              ζΔ                ⁢                                                                  ⁢                                  w                  ij                                                      =                                          w                ij                old                            +                                                ζɛ                  ij                                ⁡                                  (                                                                                    w                        i                        T                                            ⁢                                              y                        i                        t                                                              +                                                                  w                        ij                                            ⁢                                              y                        ij                        t                                                                              )                                                                    ⁢                                  ⁢                              Where            ⁢                                                  ⁢            ζ                    =                                                                      E                  0                                                  E                                      i                    -                    1                                                              ⁢                                                          ⁢              and              ⁢                                                          ⁢                                                y                  ^                                i                t                                      =                                          w                i                            ⁢                              w                i                T                            ⁢                              y                i                t                                                                        (        4        )            E0 is the initial energy when the network starts learning and Ei-1 is the energy of the (i−1)th extracted principal.
b) Cell Learning Architecture
From equation (4), the learning architecture is realized as in FIG. 1. In FIG. 1, the input data yij (i is the index of the component vector and j is the index of a value in the component vector i) is defined in the equation (2). The Σ box provides the inner product between vectors y and wi. The result of the Σ box operation will, again, be summed with the previous multiplication of yijt and wij and its output will be multiplied with the learning rate ζ before updating to wij as described in equation (4). This single unit can be cascaded into n units to obtain a PCA learning vector and this learning vector can be cascaded to obtain many as parallel eigenvector extractors as needed for each application.
c) Applications
In a study at JPL (which is reported: T. A. Duong “Real Time On-Chip Sequential Adaptive Principal Component Analysis for Data Feature Extraction and Image Compression”, GOMAC Tech-03, Vol. I, pp. 460-464, Tampa, Fla., 31 Mar.-3 Apr., 2003 (hereinafter “GOMAC”—and which is hereby incorporated herein by reference), we used two gray scale images: a woman and a tank as shown in FIGS. 3a and 4a of the aforementioned GOMAC publication. The purpose of this study was to evaluate how well our technique can extract the features of these images via principal components as opposed to the MATLAB technique and from the extracted features we can process for image compression. The woman image of FIG. 3a of the GOMAC publication consists of 256×256 gray pixel and each pixel has 8-bit quantization. FIG. 4a, from GOMAC, is 512×512 pixel image of a tank with 8-bit gray scale/pixel.
We used input vector as row data with 64 pixel/vector to construct the training vector set. When the training vector set is available, the algorithm as shown in equation (4) is applied to extract the principal vector. Our study has shown that the maximum number of iterations required is 150 of learning repetitions and the first 20 component vectors are extracted.
Feature Extraction
Feature extraction using PCA is a well known approach and is based on the most expressive features (eigen vectors with the largest eigenvalues). The first 10-component vector extracted from Elaine image using our technique is projected onto the first 10-component vector from MATLAB (inner product) and its results are shown in FIG. 2a. 
As orthogonal characteristics between principal vectors, if the learning component vector and the component vector from MATLAB are the same sequential order and identical (or close to identical), the expected inner product should be close to +/−1; otherwise, it should be close to zero.
The first 10-component vector extracted from tank image using MATLAB and our approach and the projection between principal vectors are shown in FIG. 2b. As shown in FIGS. 2a and 2b, there are ten unit values (+/−1) and the rest of the values are close to zero from which our technique can extract the feature vector as identical as that with the MATLAB technique.
Compression
In the study, we extracted the first 20-component vector from full set of 64 component vectors. Since the full image is constructed from the first 20 component principal vectors extracted using technique in MATLAB shown in FIGS. 3b and 4b (these two figures are from the GOMAC publication) and we use the same components in our technique for FIGS. 3c and 4c (again these figures are from the GOMAC publication).
d) Discussion
The advantages of this approach are: less computation, fast convergence, and simplified hardware architecture as compared against other sequential PCA approaches and our approach may be the most suitable for hardware implementation. The orthogonal constraint in PCA may have the advantages for reconstruction for data compression; however, it may not be an optimal technique for feature data extraction or mixing source data. Perhaps, ICA may be more suitable for such applications.
Neural Networks
PCA and is considered as unsupervised learning technique, which extracts the transformation based on its data structure. For example, the principal vector is obtained based on its own data set, regardless of any influence by users or operator. In contrast with the unsupervised learning, the supervised learning technique requires the desired output as user or operator's desire to enforce the parameterized weight space as transformation based on the input data set. When done, the estimated output data by the input data set via the parameterized transformation will theoretically be in the neighborhood of the desired output.
Most supervised learning neural networks focus on a software-based approach, in which the learning trajectory with respect to the weight space is often smooth when the bit quantization of the weight components and update weights are 32 bit or 64 bit floating point numbers. When the weight component and the weight update are of 8-bit value (which is typically based upon hardware implementation constraints), learning convergence is difficult. In real time applications, algorithms with fast learning and adaptive capabilities are needed. In a published previous report by inventor Duong (T. A. Duong and Allen R. Stubberud, “Convergence Analysis Of Cascade Error Projection—An Efficient Learning Algorithm For Hardware Implementation”, International Journal of Neural System, Vol. 10, No. 3, pp. 199-210, June 2000, the disclosure of which is hereby incorporated hereby by reference) it was shown that the Cascade Error Projection (CEP) algorithm provides an excellent tool for hardware implementation. CEP is an improved learning algorithm for artificial neural networks that is reliable and well suited for implementation in VLSI circuitry. In comparison with other neural-network-learning algorithms, CEP involves a self-evolving structure, requires fewer iterations to converge, and is more tolerant to low resolution in the quantization of synaptic weights; thus, CEP learns relatively quickly and the circuitry needed to implement it is relatively simple, as taught by the aforementioned paper by inventor Duong.
CEP Algorithm Approach
The CEP neural network architecture is illustrated in FIG. 3. Shaded squares and circles indicate frozen weights; a square indicates calculated weights, and a circle indicates learned weights.
Mathematical Approach:
The energy function is defined as:
                              E          ⁡                      (                          n              +              1                        )                          =                              ∑                          p              =              1                        P                    ⁢                                          ⁢                                    {                                                                    f                    h                    p                                    ⁡                                      (                                          n                      +                      1                                        )                                                  -                                                      1                    m                                    ⁢                                                            ∑                                              o                        =                        1                                            m                                        ⁢                                                                                  ⁢                                          (                                                                        t                          o                          p                                                -                                                  o                          o                          p                                                                    )                                                                                  }                        2                                                          
The weight update between the inputs (including previously added hidden units) and the newly added hidden unit is calculated as follows:
      Δ    ⁢                  ⁢                  w        ih        p            ⁡              (                  n          +          1                )              =            -      η        ⁢                  ∂                  E          ⁡                      (                          n              +              1                        )                                      ∂                              w            ih            p                    ⁡                      (                          n              +              1                        )                              
and the weight update between hidden unit n+1 and the output unit o is
                                          w            ho                    ⁡                      (                          n              +              1                        )                          =                                                                              ∑                                      p                    =                    1                                    P                                ⁢                                                                  ⁢                                                      ɛ                    o                    p                                    ⁢                                      f                    o                                          ′                      ⁢                                                                                          ⁢                      p                                                        ⁢                                                            f                      h                      p                                        ⁡                                          (                                              n                        +                        1                                            )                                                                                                                    ∑                                      p                    =                    1                                    P                                ⁢                                                                  ⁢                                                      [                                                                  f                        o                                                  ′                          ⁢                                                                                                          ⁢                          p                                                                    ⁢                                                                        f                          h                          p                                                ⁡                                                  (                                                      n                            +                            1                                                    )                                                                                      ]                                    2                                                      ⁢                                                  ⁢            with            ⁢                                                  ⁢                          f              ⁡                              (                x                )                                              =                                    1              -                              ⅇ                                  -                  x                                                                    1              +                              ⅇ                                  -                  x                                                                                        (        8        )            
The notations used are defined as follows:
m is the number of outputs, P is the number of training patterns.
Error εop=top−oop(n); where oop(n) is the output element o of the actual output o(n) for training pattern p, and top is the target element o for training pattern p. n indicates the number of previously added hidden units.
ƒ′op(n)=ƒ′op denotes the output transfer function derivative with respect to its input.
ƒhp(n+1) denotes the transfer function of hidden unit n+1.
The CEP algorithm is processed in two steps:
(i) Single Perception learning which is governed by equation (7) to update the weight vector Wih(n+1)
(ii) When the single Perception learning is completed, the weight set Who(n+1) can be obtained by the calculation governed by equation (8).
Details of the CEP algorithm and convergence analysis can be found in the aforementioned paper by inventor Duong.
Applications
For NASA applications, precision and safe landing is the primary goal in any spacecraft landing mission. To-date, however, the landing of spacecraft has been accomplished “blind” through the pre-selection of a landing site, with no in-situ corrections to the landing sequence to account for possible changes in the terrain, or for previously unrecognized fine granularity of the topography. This lack of intelligent autonomy in the landing sequence means that a landing site must be chosen a priori based on the statistical probability of not encountering dangerous terrain elements anywhere within a many square-kilometer ballistic footprint. Current Entry Decent Landing (EDL) software methods are only near real time in that they are incapable of identifying the topographical granularity of the pre-selected landing site at a rate required for safe landing. This near real time approach excludes many scientifically interesting landing sites. It is therefore obvious that some form of real time color segmentation is required to adaptively correct the spacecraft descent. For this, knowledge of terrain color and color segmentation is first incorporated through learning from pictures obtained by the orbiter imaging system. During the descent stage, the knowledge of color segmentation will be updated adaptively in real time to capture the contrast, light intensity, and changes in resolution that occur. This allows for the determination of a safe and productive landing site, and guidance of the spacecraft to this site through appropriate corrective navigational feedback control.
This shortcoming points to the need for adaptation as a primary requirement for real time dynamic landing site determination. In this section, we discuss a real time adaptive color segmentation approach using hardware-friendly neural network named CEP as an enabling technology developed at JPL.
Adaptive Architecture Approach
The most challenging aspect in color segmentation is when the light intensity and resolution are dynamically changing. It is easily recognized that the initial knowledge used to train the network will have very little effect at a new location and therefore will need to be updated through learning of newly extracted data.
Briefly, the adaptation process that can aid in spacecraft guidance may be described as follows: When the network that has acquired current knowledge at time to is used to test the subsequent image at time to+Δt, segmentation results from the image at to+Δt will be used to extract the training set to update the previous knowledge of the network at time to. This process of repeatedly segmenting and updating is performed until the Lander reaches its destination.
While the process of segmenting and updating are desired characteristics of an adaptive processor, the issue that needs addressing is how often such updates are necessary. The frequency of updates has a direct impact on power consumption. More power is consumed if updates are performed between each sequential image. The problem with infrequent updates, however, is that the network may not interpolate easily based upon new images from which the newly segmented data may be insufficient for training. To find the optimal sampling rate, we realize Δt must be “sufficiently small” and will depend upon the landing velocity and other environmental changes. This issue will become significant in the actual design and development of a spacecraft landing system.
Training Data
To classify each pixel, we used a pixel to be classified and its immediate neighbors to form a 3×3 sub-window as the input training pattern (thus each input pattern has 27 elements from 3×3 of a RGB pixel). Based on a previous study, the 3×3 RGB input pattern was found to be the optimal size in comparison to using a single RGB input, a 5×5 RGB sub-window, or a 7×7 RGB sub-window. In this study, our objective was to segment the image into three groups: “Rock1”, “Rock2”, and “Sand”. The topology of our network is a 27×5×3 cascading architecture neural network, having 27 inputs, 5 cascaded hidden units, and 3 output units.
In FIG. 4a herein, which depicts a 3:00 PM image, we sampled and collected 408 patterns for training data, 588 patterns for cross validation, and 1200 patterns for testing. With these sample sets, the learning is completed with 91% correct in training, 90% correct in validation, and 91% correct in testing.
After training was performed, the segmented output of the original image in FIG. 4a is shown in FIG. 4b herein.
With the knowledge acquired from the image of FIG. 4a herein, the network is tested with the image input shown in FIG. 5a, collected at 4:00 PM. The output result is shown in FIG. 5b (no intermediate adaptation step was performed). FIG. 5c is output result with the network acquired from the intermediate knowledge through adaptive learning.
In a similar manner, the original image shown in FIG. 6a was collected at 5 PM. FIG. 6b is the segmented image with the previous training set at 4 PM and FIG. 6c is the segmented image with intermediate adaptive step.
Based on the aforementioned results, we might conclude that the adaptive technique is needed to obtain better segmentation when the environment is changing rapidly.