This section introduces aspects that may help facilitate a better understanding of the disclosure. Accordingly, these statements are to be read in this light and are not to be understood as admissions about what is or is not prior art.
Artificial vision systems find applications in autonomous robots, security systems, micro-unmanned aerial vehicles (UAVs) and more recently, mobile phones and automobiles. These applications require algorithms that can recognize objects with a high degree of accurately while being able to execute instructions in real time. Many recent algorithms that are promising for visual understanding are computationally very expensive and cannot be run in real-time without custom hardware or power-hungry graphics processing units. The latter are not suitable for mobile, lowpower platforms. Scale-invariant feature transform (SIFT) and Speeded Up Robust Features (SURF) feature extractors, hierarchical models of the visual cortex (HMAX), and deep convolutional neural networks (DCNNs) work best for scene understanding. Recent work on DCNNs, which consist of hundreds of convolution maps across multiple layers, has shown that these models achieve higher accuracy than others for high level feature extraction.
DCNNs are a class of models that form a powerful tool to help solve visual classification problems. DCNNs consist of multiple layers of convolutions, each comprising between tens and hundreds of filters. Each convolution layer is interspersed by one sub-sampling and non-linearity operator. Referring to FIG. 1, a schematic representation of the complexity and thereby computational cost of convolutions is depicted. As seen in FIG. 1, convolutions are computationally costly, with frames spending over ninety percent of their time being processed by the filter kernels. For example, the method discussed in one related work uses five convolution layers, each with roughly three hundred filters. In the first layer, convolutions are used to extract low-level features, such as edges and textures. Deeper layers aim at combining the features extracted by the previous layers to achieve a higher level of abstraction and detect more complex features. After each convolution layer, DCNNs use a spatial pooling layer to provide the network with scale invariance. Spatial pooling also results in subsampling the image which reduces the number of computations required in latter layers. Max-pooling is a type of spatial pooling that has recently become popular. Finally, a sigmoid or non-linearity operation serves as an activation function.
Artificial vision systems represent only one set of applications that can benefit from a flexible programmable computing architecture that while improves efficiency can also reduce cost, especially in applications where computing resources are scarce. A typical platform for developing such architectures is a field programmable gate array (FPGA) platform.
Hardware accelerated vision systems implemented on FPGAs are known. Known methods require a host computer with a personal computer interface (PCI) connection in order to function. These cannot be embedded inside of small and lightweight micro-UAVs or mobile robots. Furthermore, as the FPGA requires a host computer in order to function, the actual power consumed by the entire system is a combination of the power consumed by the FPGA and the host which can be significantly higher than the consumption of the FPGA alone. Finally, these designs require large off-chip memories of their own as the communication bottleneck over PCI when using the host's memory would decrease performance.
Similar streaming architectures have been designed to meet the computational demands of DCNNs; for example the method according to one related art demonstrates the application of one such system. Their design on fully programmable logic benefits from flexibility and parallelism and also does away with the memory bottleneck by using the on-board double data rate (DDR3) memory, but it suffers from slow host-coprocessor data transfer.
Despite the recent improvements in computing architectures, there remains an unmet need to provide a degree of flexibility and efficiency with shared memory architecture where the same memory can be accessed by both software and the FPGA and requires no other system in order to function while maintaining a low level of power consumption.