Pipelined image processing operations involve the connection of discrete storage and discrete pipelined image processing components which perform image operations. These image processing operations include, for example, image convolution, image warping, nonlinear image processing operations, and other specialized processes such as connected component analysis of binary images. These systems do not integrate the processing and storage circuitry into single devices. Typically, these components are connected together with external crosspoint switches or dedicated data flow routing. The use of discrete image storage and processing components results in large circuits that are difficult to fit on small printed circuit boards. In addition, the use of discrete components and longer circuit paths reduces the execution speeds of the processes.
Even discrete processing components, such as pipelined image convolution units, require additional external circuitry during implementation. For example, an image convolver chip typically has external routing to and from the chip and external image line delay elements and pixel delay elements. The line and pixel delay elements delay image data so that simultaneous operations can be performed on a pixel neighborhood. The routing and line delay circuitry increases the amount of circuit board area and reduces execution speeds.
Dedicated pipelined image processing components can provide very high performance for image processing operations at a reduced cost to performance ratio over general-purpose processors such as von Neumann and Harvard architectures. General-purpose processors typically cannot provide fundamental image operations with the same performance as pipelined architectures because general-purpose processors are limited to a data read-process-store operation for each pixel operation. Although performance of the general-purpose processors has improved with technological advances in caching and other memory management concepts, these processors are typically not suited for image operations due to the large amount of data that is processed.
Massively parallel processors and computers can provide very fast performance in comparison to general purpose processors that use von Neumann architectures, and can match the processing speeds of pipelined hardware in some instances. These devices, however, are typically very complex to program and expensive to implement. In addition, the process of providing image data to each of the processors in a timely manner before parallel execution begins is generally slow. The parallel processors first load all of the information into the processors, then execute the image processing operations, and then read the processed data out of the processors to external devices that will perform further operations on the image data. Although the parallel processors will execute the process step at rates that are typically much faster than their von Neumann counterparts, the parallel processors are typically limited by the read-process-store cycle.
Further, certain image processing operations are difficult to implement with parallel architectures. While image convolution and other filtering operations that use relatively small pixel neighborhoods can be implemented efficiently in parallel systems, operations such as image warping are still quite inefficient.
Image processing applications are typically separated into two components: fundamental, i.e.--front-end, image processing operations, such as filtering, feature extraction, image alignment, and arithmetic operations, and higher-level processes that operate on the processed image data from the front-end processing to fulfill the requirements of a particular application. In efficient implementations, the front-end processing is typically performed with dedicated hardware which can provide processing at a very high performance to cost ratio. The higher-level processing of the image data is typically performed with general-purpose processors because of their flexibility. Front-end processes typically incorporate the combination of many discrete steps, however, which makes the implementation of these processes in dedicated hardware slow and complicated.