In practice, a classifier receives a set of measurements or features as input, and assigns a category or class to this set. Thus, the classifier determines the mapping between an input set and a class. There are several types of classifiers, including those that learn, such as, for example, neural classifiers, and those that do not learn, such as, for example, rule-based expert systems. Classifier learning may be supervised or unsupervised. With regard to supervised learning, classes are pre-determined, and the classifier is trained on a set of input data with known classes. With regard to unsupervised learning, the classes are not pre-determined, but emerge from the properties of the distribution of the inputs.
Neural classifiers are based on analogies of the functions of neurons. Each neuron is composed of three key structural components: a dendritic tree, a cell body, and an axon. The dendritic tree receives connections from other neurons at junctions called synapses. The activity of those neurons connected to the dendritic tree can cause changes in the electrical properties of the dendrite. The effect that any one connection has on the electrical properties of the dendrite depends on the properties of its synapse. These changes in electrical properties propagate to the cell body where they are integrated. If the integrated voltage exceeds a threshold, the neuron propagates an output signal along its axon, which makes connections to the dendrites of other neurons. Thus, each neuron receives multiple inputs, which have varying effects on the integrated voltage in the cell body according to the “strength” of its synapses.
To mimic the characteristics of neurons, most neural classifiers treat an input as a real valued vector, where each dimension of the vector corresponds to a single connection from another neural classifier. For each of these inputs, there is a corresponding “weight,” which is analogous to the strength of the synapse. This set of weights can be represented as a weight vector. To mimic the integrative properties of the cell body, neural classifiers perform a weighted sum of its inputs, which is equivalent to performing a dot product of the input and weight vectors. To mimic the threshold controlling the output of the neuron, the weighted sum is mapped by a non-linear activation function, such as a sigmoid. This mapped value is considered the output of the neural classifier.
The weights used in the computation of the neural classifier can be either fixed or modifiable. When the weights are modified, they are done so in accordance with a learning rule, which specifies a general method for the modification. A commonly used learning rule is Hebb's rule, which strengthens a weight if the source and destination neural classifiers are concurrently active.
The model of a single neural classifier described above is also known as a perceptron. The input to a perceptron consists of an N-dimensional vector of real values, which can be considered a point in the N-dimensional input space. The perceptron multiplies the input by an N-dimensional weight vector that it possesses. The weight vector defines an (N-1)-dimensional hyperplane that is normal to the weight vector and that divides the input space into two regions. The perceptron generates a positive response if the point corresponding to the input vector lies on one side of the hyperplane and generates a negative response if it lies on the other. A positive response indicates that the input belongs to a first class, and a negative response indicates that input belongs to a second class. Consequently, a perceptron is useful for classification problems that are linearly separable; that is, where the regions of the input space corresponding to two separate classes can be separated by a hyperplane. An input dimension is considered to contribute to the output if its weighted value is of the same sign as the response. The significance of individual contributing input dimensions increases as the magnitude of their weighted values increases. Typically, perceptrons are not used in isolation, but connected together in networks.
It is possible to use perceptrons to classify non-linearly separable regions of the input space as a single class. This requires constructing a “multi-layer” network, where perceptrons in the first layer receive the inputs from the input space and produce intermediate classifications. These intermediate classifications are used as inputs to a subsequent “layer.” The outputs of the second layer may be considered the overall classification output or they can serve as inputs to subsequent “layers.” The number of layers is a design parameter, where two is the most common choice, as it has been shown that a sufficiently large, two-layer network is functionally equivalent to networks with more layers. The connectivity in the network is typically fixed, which means that the connections among classifiers in the network are not created, destroyed, or reassigned during the operation of the network.
As an example of network organization, a biologically inspired network having a hierarchical organization of classifier maps is described. A classifier map is a two dimensional array of classifiers. The input to such a network is usually one or more two dimensional arrays of sensors. Biologically inspired examples of inputs include light detectors arranged in the retina and touch sensors embedded in the skin. Each map is responsible for classifying one or more attributes of the input arrays. For example, the initial map of the visual system takes inputs indirectly from the retina and classifies the existence and orientation of edges of the visual scene. Each classifier only considers visual information in a limited region of the input space. Furthermore, nearby classifiers in the output map typically have overlapping regions of interest that are also proximately positioned in the input map. This property establishes a “topographic” mapping from the retina to the initial visual system map.
Typically, classifier maps serve as inputs to other classifier maps. This enables the integration of information from lower maps and the identification of combinations of attributes and relationships in the input space. Furthermore, every classifier output of a hierarchical map can be considered a portion of the network output. Such outputs could be used to generate a response or behavior that is predicated on the existence of combinations of attributes in the input space.
Such hierarchical networks of classifiers, however, have a fundamental limitation. In F. Rosenblatt, “Principles of Neurodynamics: Perceptrons and the Theory of Brian Mechanisms,” Spartan Books, Washington D.C., 1962, this limitation is illustrated with the following example. Consider a hierarchical network, as described above, with outputs that respond to the four following attributes of a visual scene: 1) the presence of a square, regardless of its location in the input space; 2) the presence of a triangle, regardless of its location in the input space, 3) the presence of a visual object in the top half of the image, regardless of the nature of the object; and 4) the presence of a visual object in the bottom half of the image, regardless of the nature of the object. In further augmenting Rosenblatt's example, first consider that the network has an initial map that receives the sensory input and generates a topographic response to fundamental attributes, like line segments, and projects to the classifiers for the four attributes described above. Second, consider that this network has an additional output layer that generates a behavior when a square exists in the top of the image.
This example network would clearly respond properly in each of the four cases where a single object is present in the image. When a square is present in the top of the image, the desired behavior would be generated because both attributes of the input space are recognized. When the square is in the bottom or when a triangle is present in either the top or bottom of the image no behavior would be generated because the two required attributes (“square” and “top”) are not simultaneously recognized.
When two objects are presented to the network, however, an erroneous response can occur. In particular, when a triangle is present in the top of the image and a square is present in the bottom, both attributes required for the behavior will be simultaneously present and the behavior will erroneously take place. For example in von der Malsburg et al., “The What and Why of Binding: The Modeler's Perspective,” Neuron, Vol. 24, 95-104, September 1999, it is observed that the classical neural network “. . . has no flexible means of constructing higher-level symbols by combining more elementary symbols,” and that “coactivating the elementary symbols leads to binding ambiguity when more than one composite symbol is to be expressed.”
Rosenblatt's example only illustrates one half of the current problem, the “superposition catastrophe.” A related problem is “component discrimination.” Component discrimination involves responding to the attributes that 1) were originally recognized in different maps or map locations, and 2) gave rise to a higher level classification that predicates the response. Both the “superposition catastrophe” and “component discrimination” problems can be illustrated with the following example.
Consider a system that takes as its input an electronic representation of an image of a natural scene that includes a tulip amid blades of grass. The tulip will be of fixed size and orientation, but can exist anywhere in the image. The task of the system is to change the color of the image pixels representing the contours of the flower. This task is to be performed by a fixed network of classifiers, called the decision subsystem, that classifies the various attributes and patterns in the visual scene and controls the pixel coloring action.
The decision subsystem has the following properties. It includes “edge” classifiers that respond to edge information at each pixel location. It possesses “contour” classifiers that respond to contours in the image derived from topographically-registered, co-oriented, and co-linear edge classifier responses. It has a “tulip” classifier that takes the outputs of contour classifiers as inputs and generates a positive response when a tulip is present anywhere in the image. The tulip classifier may exist within a complex, multi-layer, fixed sub-network of classifiers. The decision subsystem must then direct the coloring means to the precise locations of contour information in the image that contributed to tulip recognition.
This example illustrates both the superposition catastrophe and component discrimination problems. It illustrates the component discrimination problem because it possesses each of the three requirements: 1) the system must respond to the locations of edge classifiers, 2) these edge classifiers are recognized in different locations of the edge classifier map, and 3) the edge classifiers contribute to contour classifiers. It also illustrates the superposition catastrophe problem because the decision subsystem must respond when an edge is recognized at a pixel location, the edge is such that it contributed to a tulip classification, and other edges corresponding to grass blades are also present.
This tulip contour coloring problem statement precludes common techniques that alter the relationships among image pixels and processing elements during the course of tulip recognition and coloring response. Since there is no means to interact with the natural scene, it is not possible to pan a camera across the scene to translate the pixel information for location constrained classification. Furthermore, since the network is fixed, it is not possible to use indirect addressing in a computer with random access memory to perform operations, such as convolutions. Creating networks that could respond to every combination of attributes would be intractable.
There is one traditional approach that could, theoretically, solve the superposition catastrophe and component discrimination problems, and our combined exemplar, the tulip coloring problem. To be specific, networks of spiking models of neurons can exploit spike timing differences to modulate the effects of lower level classifiers on higher level classifiers. Spike models of neurons connected in a network could create synchronized firing patterns that would perform both classification and prevent higher level neurons from responding to lower-level neurons that are sufficiently unsynchronized. This would, for example, enable tulip contour coloring classifiers to ignore contours emanating from grass blades. Modeling classifier networks at this level of timing resolution, using the temporal dynamics of neurons, requires aggregations of spiking inputs produced by ensembles of classifiers to replace a single classifier that communicates and uses timing information at a higher level of abstraction. Modeling networks using spiking responses and the temporal dynamics of neurons would require many orders of magnitude more computation and communication than models at higher levels of abstraction. This would be computationally tractable for only very small problems.