1. Field of the Invention
The present invention relates generally to object recognition and identification in a computerized processing system, and more particularly in one exemplary aspect to a computer vision apparatus and methods of temporally proximate object recognition.
2. Description of Related Art
Object recognition in the context of computer vision relates to finding a given object in an image or a sequence of frames in a video segment. Typically, temporally proximate features that have high temporal correlations are identified within the sequence of frames, with each successive frame containing a temporally proximate representation of an object. Object representations, also referred to as the “view”, may change from frame to frame due to a variety of object transformations, such as rotation, movement/translation, change in lighting, background, noise, appearance of other objects, partial blocking/unblocking of the object, etc. Temporally proximate object representations occur when the frame rate of object capture is commensurate with the timescales of these transformations, so that at least a subset of a particular object representation appears in several consecutive frames. Temporal proximity of object representations allows a computer vision system to recognize and associate different views with the same object (for example, different phases of a rotating triangle are recognized and associated with the same triangle). Such temporal processing (also referred to as learning), enables object detection and tracking based on an invariant system response with respect to commonly appearing transformations (e.g., rotation, scaling, and translation).
Although temporal correlation between successive frames are reduced by discontinuities, sudden object movements, and noise, temporal correlations are typically useful for tracking objects evolving continuously and slowly, e.g., on time scales that are comparable to the frame interval, such as tracking human movements in a typical video stream of about 24 frames per second (fps).
Most existing approaches to binding (associating) temporarily proximate object features from different frames rely on the rate based neural models (see, e.g., Földiák, P. Learning invariance from transformation sequences. Neural Computation, 1991, 3(2), 194-200) with a modified Hebbian learning rule, also referred to as the “trace rule”. Hebbian models postulate that memory is stored in the synaptic weights, and learning is the process that changes those weights. The trace rule is found to produce invariant representations of simple objects (Wallis, G.; Rolls, E. T. A model of invariant object recognition in the visual system. Progress in Neurobiology. 1997, 51, 167-194). Similar concepts have been used in Slow Features Analysis approach as described by (Wiskott, L.; Sejnowski, T. J. Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 2002, 14, (4), 715-770) and (Janowitz, M. K.; Van Rossum, M. C. W. Excitability changes that complement Hebbian learning. Network, Computation in Neural Systems, 2006, 17 (1), 31-41), who showed that excitability changes in a processing unit can complement Hebbian learning to bind associations between successive image frames.
However, most of the existing “brain inspired” computer vision models rely either on modeling computational blocks which do not correspond to neurons (rather, e.g. whole functional circuits), or even if they do model individual neurons, then usually so-called rate based models are used, wherein information about objects is encoded into a dimensionless firing rate, characterized by neuron spike count or by a mean neuron firing rate. An object (and/or object feature) is detected based on matching of an observed rate to a predetermined value associated with the object representation. As a result, in order to encode and recognize different representation of the same object (i.e., a bar of different lengths), the existing methods require different detector nodes that each specialize in a single object representation. Invariably, such systems scale poorly with an increase in the number of objects, their variety and complexity. Additionally, the use of specialized detectors without detector reuse requires detection apparatus with an increased numbers of detectors in order to perform detection of more complex objects. Furthermore, such rate-based approaches merely encode data frames into dimensionless activity of detector nodes, while completely neglecting accounting for the short-term temporal interactions between nodes.
Accordingly, there is a salient need for a more efficient and scalable computerized object recognition solution that utilizes component reuse, lowers cost and reduces complexity, yet which is capable of dealing with many objects and their transformations.