The algorithms of signal processing and image processing often treat data as a two-dimensional array of numbers. In image processing, the image itself is a two-dimensional array of values called pixels. In acoustic signal processing, a two-dimensional array of spectral coefficients distributed in time and frequency is often used. General purpose computers are usually too slow for real time processing of this type of data. The data rates are high, the processing required for many algorithms is extensive, and the throughput rate often requires a massively parallel approach to processing the data. Systolic arrays provide a solution to these obstacles in many cases.
A systolic array of processors has the capability of matching the data flow through the device to the algorithms used in image and signal processing. In neural network image processing, neural networks of different sizes and connectivity are often applied to an image during the process of transforming raw pixel data into feature vectors that serve to recognize and classify objects within the image. A systolic array is a network of processors that rhythmically compute and pass data through the network.
Most systolic arrays are designed and built as a hardware implementation of a particular algorithm. In such cases the array can execute the algorithm for a problem of a specific size (or a limited number of sizes). For example, one commercially available convolution chip can be configured to perform convolutions up to any kernel size, N.times.M, as long as N and M are both 8 or less. Another commercially available chip can perform either a 3.times.7 convolution or a 1.times.21 convolution. If it is necessary to execute the same algorithm for a problem of a larger size, then a larger systolic array must be built or else the convolution must be implemented in software. This software implementation is generally cumbersome and time consuming, and the overall performance of the array declines drastically when the convolution problem exceeds the physical size of the systolic array. A typical systolic array chip that has a 3.times.3 array of processing elements can perform a 3.times.3 convolution on a 512.times.512 image in 6-7 milliseconds. However, a 4.times.4 or 5.times.5 convolution cannot be done in this hardware, and the host processor will require several seconds or more to compute such convolutions.
The large increase in processing time usually encountered when the kernel size increases beyond a certain size has led researchers and users to develop their algorithms and applications with small kernels. This has been true in spite of the better performance of larger kernel algorithms such as the edge enhancement algorithm using the Laplacian of a Gaussian kernel whose performance is less noise dependent when the kernel size becomes 7.times.7 or larger. In general, the edge enhancement algorithms that use a large kernel are less noisy than those that use a small kernel.