1. Field of the Invention
The present invention is an apparatus and method for object recognition from at least an image stream from at least an image frame utilizing an artificial neural network, in which the invention further comprises means for simultaneously generating multiple subwindows of an image frame from a single image stream, means for providing pixel data and interlayer neuron data to at least a subwindow processor, means for multiplying and accumulating the product of pixel data or interlayer data and synapse weight, and means for applying an activation function to an accumulated neuron value, including the processes of utilizing the above means to produce the object recognition.
2. Background of the Invention
Several types of digital artificial neural network (ANN) architectures have been disclosed that allow programmability and reconfigurability of synapse connections, synapse weights, and neuron operation. As many of these architectures are not specific to object recognition they do not consider the intricacies of processing an image at a subwindow granularity. A “subwindow” is an N×M matrix of pixels that covers a rectangular area of an image frame which encompasses the input data that can be provided to an ANN.
T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin, V. Srikantam, “A Generic Reconfigurable Neural Network Architecture Implemented as a Network on Chip” (hereinafter Theocharides) proposes a Network-On-Chip architecture that links a cluster of four neuron processing elements and an activation unit to other such clusters. The disadvantage of this architecture, however, is that virtualization is only considered when mapping the neurons of a particular ANN onto a smaller number of processing elements while not considering the virtualization that can be achieved at the subwindow abstraction level. For example, in a raster scan image pixel stream that does not allow subwindows to overlap, the sequential arrival of pixels guarantees that the pixel will belong to only a single subwindow region. The single subwindow for which the current pixel in the pixel stream belongs is termed the “active subwindow” while all other subwindows are termed “inactive subwindows”. The present invention comprises a method to reuse hardware resources belonging to inactive subwindows, allowing a reduction in overall hardware requirements.
Most of the prior art disclose ANN architectures for generically performing the neuron computation for a plurality of emulated neurons while not considering the input access patterns specific to streaming image sources. For example, U.S. Pat. No. 5,253,330, U.S. Pat. No. 5,799,134, and U.S. Pat. No. 6,836,767 disclose reconfigurable artificial neural network emulators for general purpose classification and detection tasks. Since the arrival pattern of input data is not assumed to be specific to raster image sources and processing is not constrained to subwindows, the architecture is not efficient for processing raster scan image streams. In contrast, the current invention includes the image pyramid generation process, which extracts the set of image subwindows from a streaming raster pixel stream.
U.S. Pat. No. 5,956,703 discloses an IC that performs the ANN computation for feed-forward and recurrent networks. The invention, however, requires complex synchronization of the processing element outputs to ensure that the computation is performed in a lockstep fashion and not corrupted otherwise. In contrast, the present invention uses a packet data format to transport data between layers of the ANN. The packet is comprised of data and identifying information that allows ANN computation to be performed in an asynchronous fashion without the risk of computation corruption.
Several processing elements dedicated to performing the neuron computation on a large number of neurons while utilizing a small number of hardware resources have been disclosed. For example, Theocharides discloses virtualization techniques that map many neurons to a fixed number of hardware resources including multipliers, adders, accumulators, and activation units. However, the number of neurons that can be emulated in these architectures is still limited by the amount of primary storage allotted to each processing element. The present invention comprises a mechanism that utilizes a virtualization scheme that swaps subwindows between primary and secondary storage to provide almost unlimited subwindow capacity for a given processing element.