Background modeling when performing machine vision processing has traditionally been a difficult problem. Typical background modeling algorithms utilize pixel-based approaches. For example, in (Culibrk, D., Marques, O, Socek, O., Kalva, H., & Furht, B. (2007). Neural Network Approach to Background Modeling for Video Object Segmentation. IEEE Transactions on Neural Networks, 18 (6), 1614-1627), a subnet is used on a per-pixel basis, such that most lighting conditions associated with a particular pixel are encapsulated in that subnet and learned for any such given pixel. This approach, like so many others utilizing an AI-based approach for background, is insufficient and inefficient. Drastic lighting conditions may severely impact such approaches. More importantly, a per-pixel subnet is extremely expensive and is difficult to implement in real-time.
In (Parzen, E. (1962). On the Estimation of a Probability Density Function and the Mode. Annals of Math. Stats., 33, 1065-1076), Parzen shows that if the data is consistent, then Equation 1 is presented as follows.E|ƒn(X)−ƒ(X)2|→0 as n→∞  Equation 1E represents the energy that is associated with a given function f, and this lends itself useful to a special class of neural networks called deep belief nets, in which pairwise layer learning becomes very valuable, and a Gibbs sampling procedure may be used in a classification phase.
When using such systems, an expected classification error for each classification step gets smaller as the datasets employed in training and processing get larger. However, the inventors of the present invention have determined that the error associated with one or more of the classification steps typically reaches a global minimum beyond which improvements are not possible. More importantly, the inventors of the present invention have determined that in practice, such errors are not nearly as negligible as Parzen's work had theorized. The existence of these errors reduces the ability to properly recognize and categorize one or more image features.
Some advanced machine vision processing may employ one or more deep belief networks, Such deep belief networks typically employ restricted Boltzmann Machines (RBMs). A Restricted Boltzmann machine (RBM) is similar to a multilayer perceptron (MLP) in that it consists of binary neurons that communicate with other neurons via synaptic connections of differing weights. These neurons exist either in the visible layer, meaning that their desired state can be set and observed, or in a hidden layer, in which case their desired state is unknown. Also, an RBM differs from a normal Boltzmann machine in that visible-to-visible and hidden-to-hidden synaptic connections are disallowed. An RBM consists of exactly one visible and one hidden layer. These two layers can be further reproduced and stacked to form a deeper network.
The binary state of a typical RBM neuron, i, is represented by si, where siε{0,1}. A weight of the synaptic connection between visible neuron i and hidden neuron j is represented by wij. Neurons can also have biases, represented by bi for neuron i. The following conditions are true for synaptic connections in an RBM:                There are no synaptic connections between any two neurons in the same layer, there is no synaptic connection between a neuron and itself, and all synaptic connections are symmetrical. These rules are set forth in Equation 2.wi(n)i(m)=0 There is no synaptic connection between any two neurons in the same layer.wii=0 There is no synaptic connection between a neuron and itself.wij=wji Any synaptic connections between two neurons are symmetrical   Equation 2FIG. 8 depicts a two-layer RBM embodying this situation. As is shown in FIG. 8, such a two-layer Restricted Boltzman Machine 1000 is formed of a visible layer 1010 and a hidden layer 1030. Visible layer 1010 is formed of a plurality of neurons 1020 while hidden layer 1030 is formed of a plurality of neurons 1040. Symmetrical synaptic connections between each neuron in one of the hidden and visible layers and all of the neurons in the other of the hidden and visible layers 1050 are shown. As noted above, there are no synaptic connections between any neuron and itself, or any other neurons in a same layer in which it resides.        
In addition to being binary, the neurons in an RBM are also stochastic, with their probability of being active given by Equation 3:
                              p          ⁡                      (                                          s                i                            =              1                        )                          =                  1                      1            +                          exp              (                                                -                                      b                    i                                                  -                                                      ∑                    j                                    ⁢                                                            s                      j                                        ⁢                                          w                      ij                                                                                  )                                                          Equation        ⁢                                  ⁢        3            
Multiple layers of RBMs are often utilized, consisting of more than one hidden layer. Given initial data in the visible layer, sometimes comprised of input pixels in applications of computer vision, a greedy learning process that is similar to (Hinton, G., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527-1554) may be utilized for an unsupervised learning algorithm (discussed below). Once a first hidden layer is substantially trained (by defining various synaptic connections as described above), a second hidden layer may be trained in a similar manner, treating the first hidden layer as the new visible layer for training purposes. This process can be repeated until the desired number of hidden layers has been trained. Every additional hidden layer can increase the probability that the RBM's visible layer will match the original training data, improving the RBM's generative model. FIG. 9 shows a simple four-layer, 18-neuron RBM 1100, comprising visible layer 1110, first hidden layer 1120, second hidden layer 1130 and third hidden layer 1140.
Visible layer 1110 in FIG. 9 represents a sensory input to the RBM, while the three hidden layers represent feature detectors that can be trained using a greedy learning algorithm. The first hidden layer 1120 contains features of the visible layer, the second hidden layer 1130 contains features of the first, and the third hidden layer 1140 contains features of the second hidden layer. This concept can also be extended to more hidden layers. The more layers are trained, the more abstract the representation.
Once all layer-pairs of the RBM are pretrained and fine-tuned (via supervised backpropagation), the RBM theoretically should be able to accurately reconstruct a data vector in the visible layer based on the synaptic connection weights and neuron biases. However, because of the stochastic nature of the neurons in an RBM, some thought needs to be given to data sampling. The trained data vector in an RBM can be sampled through alternating Gibbs sampling. Given a random data vector, weights are iteratively updated between the various layers until equilibrium is reached. Two steps are used for updating each layer. First, in order to update each of the hidden (feature detector) neurons, sj, based on each of the visible neurons, si, each hidden neuron is switched on with a probability as shown in Equation 4.
                              p          ⁡                      (                                          s                j                            =              I                        )                          =                  I                      I            +                          ⅇ                              (                                                      -                                          b                      j                                                        -                                                            ∑                                              i                        ∈                                                  (                          visible                          )                                                                                      ⁢                                                                  s                        i                                            ⁢                                              w                        ij                                                                                            )                                                                        Equation        ⁢                                  ⁢        4            After the hidden neurons are updated, the visible neurons are then updated based on the new states of the hidden neurons. Each visible neuron, si, is switched on with a probability as shown in Equation 5.
                              p          ⁡                      (                                          s                i                            =              I                        )                          =                  I                      I            +                          ⅇ                              (                                                      -                                          b                      i                                                        -                                                            ∑                                              j                        ∈                                                  (                          hidden                          )                                                                                      ⁢                                                                  s                        j                                            ⁢                                              w                        ij                                                                                            )                                                                        Equation        ⁢                                  ⁢        5            
Equations 4 and 5 above define probabilities. The weights and biases in an RBM only determine the likelihood of any particular neuron being activated. Alternating Gibbs sampling is also used in order to observe the RBM's trained data vector, instead of a single pass through the network like in an MLP. The two alternating steps of Gibbs processing in two or more adjacent layers may alternate until the probability of finding the RBM in any particular state stays constant, even if the states of the individual neurons in either layer are changing. An RBM that satisfies this condition is said to have reached “thermal equilibrium”, see (Hinton, Osindero, & Teh, 2006).
Many learning approaches have been suggested for training DBNs. Some approaches focus on discovering structure from input, if the intended purpose involves the classification of 2D and 3D objects. For instance, in (Hinton, Osindero, & Teh, 2006), a DBN is used to discover features and an overall structure in the input. A DBN approaches learning structure and extracting features through a series of layers, in which every two layers are trained independently, in a manner as described above. This allows for an unsupervised learning step that progressively extracts more abstract features, until the penultimate layer of a network (layer before last). A smaller set of preclassified data may then be used under undirected training conditions to assign labels to the training sets and train the network on the classification step. So, the approach is comprised of two fundamental steps (Hinton, Osindero, & Teh, 2006):                1. Learn new features, and more abstract representations of such features in an unsupervised manner        2. Learn classification associated with such features in a supervised manner, or rather, a semi-supervised manner.Such an approach does not, however, classify the data without another discriminative learning model used to train the RBM with a (possibly smaller) set of pre-classified data.        
For instance, RBMs are used in Hinton's unsupervised learning algorithm digit example (Hinton, Osindero, & Teh, 2006) by taking pixel data as the visible layer and feature detectors as the hidden layer. Every feature detector neuron, j, is connected to every pixel, i, with a certain weight, wij. Each weight is initially zero but is repeatedly updated based on Equation 6.Δwij=ε(<sisj>data−<sisj>reconstruction)  Equation 6
In equation 6, £ is the learning rate constant. The <sisj>data term is how often pixel i and feature detector j are both on in a batch of 100 (for example) training images when the states of the feature detectors have been updated based on training data (pixel states) in the visible layer. Similarly, <sisj>reconstruction is how often pixel i and feature detector j are both on in such an exemplary batch of 100 training images when the pixels in the visible layer have been updated based on the states of the feature detectors in the hidden layer. A similar approach can be used to update the biases bi of visible neurons i as shown in Equation 7.
Equation 7Δbi=ε(p(si,data=1)−p(si,reconstruction=1))  (5)Note that the learning rate constant, ε, need not be the same as the corresponding constant in equation (4). The p(si,data=1) term is the probability of the pixel i being “ON”, or activated, according to the training data, while the p(si,reconstruction=1) term is the probability of the same pixel being on according to the RBM's reconstruction of the image. The biases, like the weights, are also updated every 100 training images, for example. A similar equation may be used for the biases of each hidden unit. Other sets of training images may also be employed.
There are many problems that are associated with the current RBMs. Although such a class of AI algorithms performs very well, and the deep nature of the network can, at times, outperform other implementations, they still lack in some fundamental areas:
Lack of tractability. RBMs are intractable as a solution, explaining the inability of an RBM to completely represent a dataset no matter how clean such a set is in the feature space.
Incapability of learning more complex structures. Although deep topologies have been successfully used, such topologies lack the ability to glean complex relationships that shallower topologies can already have. In the end, RBMs fundamentally lack the complex neuronal model that is associated with biological neural networks.
RBMs don't offer a means for improving the quality of recognition autonomously. RBMs are feature detectors.
No “eureka moment” for RBMs. RBMs don't hit a magical plateau, beyond which the error rate suddenly falls exponentially. Such an ability is innately useful to AI applications. The reason why such a process doesn't exist is because of the RBM's incapability of acquiring and defining new feature classes on its own
RBMs are not monitored in real-time. RBMs don't evolve. The premise of utilizing them is that first an AI is trained, and then it is used.
RBMs do not allow flexibility in training. There is only one mode of training, based on greedy learning.
It would therefore be beneficial to present a method and apparatus for overcoming the drawbacks of the prior art through modification the RBM topology and architecture to address the aforementioned and other drawbacks.