1. Field of the Disclosure
The present innovation relates generally to artificial neuron networks, and more particularly in one exemplary aspect to computerized apparatus and methods for encoding sensory input using spiking neuron networks.
2. Description of Related Art
Object recognition in the context of computer vision relates to finding a given object in an image or a sequence of frames in a video segment. Typically, temporally proximate features that have high temporal correlations are identified within the sequence of frames, with each successive frame containing a temporally proximate representation of an object. Object representations, also referred to as the “view”, may change from frame to frame due to a variety of object transformations, such as rotation, movement/translation, change in lighting, background, noise, appearance of other objects, partial blocking/unblocking of the object, etc. Temporally proximate object representations occur when the frame rate of object capture is commensurate with the timescales of these transformations, so that at least a subset of a particular object representation appears in several consecutive frames. Temporal proximity of object representations allows a computer vision system to recognize and associate different views with the same object (for example, different phases of a rotating triangle are recognized and associated with the same triangle). Such temporal processing (also referred to as learning), enables object detection and tracking based on an invariant system response with respect to commonly appearing transformations (e.g., rotation, scaling, and translation).
Although temporal correlation between successive frames are reduced by discontinuities, sudden object movements, and noise, temporal correlations are typically useful for tracking objects evolving continuously and slowly, e.g., on time scales that are comparable to the frame interval, such as tracking human movements in a typical video stream of about 24 frames per second (fps).
Some existing approaches to binding (associating) temporarily proximate object features from different frames may rely on the rate based neural models. Rate-based models encode information about objects into a dimensionless firing rate, characterized by neuron spike count or by a mean neuron firing rate. An object (and/or object feature) is detected based on matching of an observed rate to a predetermined value associated with the object representation. As a result, in order to encode and recognize different representation of the same object (i.e., a bar of different lengths), the existing methods require different detector nodes that each specialize in a single object representation. Invariably, such systems scale poorly with an increase in the number of objects, their variety and complexity. Additionally, the use of specialized detectors without detector reuse requires detection apparatus with an increased numbers of detectors in order to perform detection of more complex objects. Furthermore, such rate-based approaches merely encode data frames into dimensionless activity of detector nodes, while completely neglecting the short-term temporal interactions between nodes.