1. Field of the Invention
The present invention relates generally to object recognition and identification in a computerized processing system, and more particularly in one exemplary aspect to apparatus and methods of adaptive encoding of sensory input.
2. Description of Related Art
Object recognition in computer vision is the task of finding a given object in an image or video sequence. It is often desired to recognize objects (or object features, such as edges, intersection of edges, etc.) invariantly with respect to object parameters, such as position, size, or orientation. Typically, the object detection task is decomposed into several hierarchical steps, beginning with pre-processing of some basic features of objects in the image, and subsequent processing of increasingly complex features in each successive layer of the hierarchy. Some of the existing approaches to object recognition attempt to mimic the operation of biological neurons by implementing learning and adaptation of synaptic weights in networks of neurons. In other models, neurons are configured to receive visual input and have local mutual inhibition, meaning the firing of one neuron decreases the probability of firing of neighboring neurons. Other approaches utilize “time-to-first-spike” coding, where information is transmitted in the latency of the first spike of a neuron or a set of neurons (Thorpe S.; Ultra-Rapid Scene Categorization with a Wave of Spikes. In H. H. Bulthoff et al. (eds.), Biologically Motivated Computer Vision, Lecture Notes in Computer Science, 2002, 2525, pp 1-15, Springer-Verlag, Berlin.). It has been shown that a network with visual input, mutual inhibition, and time-to-first spike coding can develop orientation-selective receptive fields (Heinke D., Mavritsaki E. Computational Modeling in Behavioural Neuroscience, Psychology Press, 2009).
Some of the existing visual recognition systems utilize spikes, saccadic movements, a temporal derivative silicon retina, and a second layer of pattern recognition based on multiple neurons and time-to-first-spike coding (Oster M., Lichtsteiner P., Delbruck T, Liu S. A Spike-Based Saccadic Recognition System, ISCAS 2007. IEEE International Symposium on Circuits and Systems, 2009, pp. 3083-3086).
However, most existing approaches fail to unify saccadic movements and temporal filtering with learning, specifically spike-timing dependent plasticity. Because the existing systems (such as Oster et. al., referred to supra) do not utilize the statistics of the inputs to evolve the system, the filters for spatial features do not develop appropriate spatial and temporal characteristics for extracting information from further inputs when the input statistics change over time.
Some object recognition techniques rely on a matched filter approach. Under this approach, objects or features of objects are detected by banks of filters, with each filter tuned to a particular object or feature type, size, and/or location. Therefore, the filter produces an output signal when it detects a ‘match’ in an incoming signal. Matched filter techniques disadvantageously require a multitude of filters, as they must be placed at multiple locations where the object can be present; each location requires filters tuned to different sizes or rotations of the feature or object, thereby further adding to complexity. The matched filter technique further relies on filters that are predetermined and fixed. Because they are predetermined and fixed there must be an over-representation of said filters, taking up resources, in order to address the unknown or estimated statistics of the inputs, especially if those statistics change over time.
Accordingly, there is a salient need for an improved computerized sensory input encoding and recognition solution. Ideally, such improved solution would adapts to the statistics of its inputs, including automatically tuning to the correct feature characteristics that are important for further stages of the object recognition hierarchy by a learning process.