Recent advances in image capturing techniques have brought about a need for more efficient methods for processing image data, both for real-time or near real-time applications. Conventional computational systems are becoming increasingly less effective for large scale real time image processing. Conventional computational systems are also deficient when image processing is carried out for two or more spatial dimensions.
In conventional image processing systems, images are captured and gathered on a frame-by-frame basis. Thereafter, the pixels of the stored frames are sequentially accessed by one or more CPUs from one or more DRAM subsystems. This results in systems with a limited throughput. Such conventional image processing systems are often unable to provide the required processing speed if, for example, the frame rate is high, or the required resolution is high, or the total pixel count and the color depth per pixel is high, or the number of CPU instructions per pixel is high. This processing limitation is caused, in part, by the fact that the CPUs cannot access image data fast enough from the memory (i.e., DRAM) and/or fetch code from the DRAM subsystem. Multiprocessor systems have been developed and deployed but because accesses to the DRAMs are limited due to contention and blocking, these systems are unable to process the frames at the required rates.
This problem is further compounded by the fact that the image processing requirement for a two-dimensional image with N×M pixels (typically M=3×N/4 or M=4×N/5) is a linear function of N2. This results in either 3×N2/4 or 4N2/5 multiplied by the number of instructions per pixel. Accordingly, as a linear image increases in size from N1 to N2, the image processing time increases by a factor equal to N12/N22.
FIG. 1 is a simplified block of a conventional image processing system 10. Image processing system 10 is shown as including CCD 15, PCI card 20 and host mainboard 30. CCD 15 is adapted to capture the images that are digitized by Analog-to-Digital Converter (ADC) 22 of PCI card 20. The digitized image data of an entire frame is then stored in frame buffer 24, also referred to as transient image RAM 24. Thereafter, the stored frame is transferred through a bridge, such as PCI Bridge 26 into host DRAM 32 of host mainboard 30 via either host mainboard 32's CPU, or a bridge DMA controller, such as the DMA controller (not shown) disposed in PCI NorthBridge 34. The data transferred and stored in the host DRAM 32 is subsequently accessed for further processing. The architecture of image processing system 10 is relatively inexpensive and can be built with standard parts and off-the-shelf processors, however, it has a limited throughput.
The increasing complexity of manufactured products and the requirement to consistently achieve high yields when manufacturing these complex products necessitates a high quality control and product inspection standards, such as those defined by ISO 9000 standards. Such quality control systems are often required to test a product (also referred to as a unit) under static conditions as well as when the unit is in operation. This requires high frame processing rates in order to provide the capability to tag defective units so as to reduce additional testing and adjust manufacturing process parameters in real time in order to increase yield. Despite their ever-increasing CPU performance, conventional image processing systems are not fully effective at handling such high frame processing rates, partly due to the latency associated with the DRAMs.
Another technique that has been developed to reduce the cost of image processing is to integrate the image capture and gathering operations—often done via a charge coupled device (CCD)—and some computational circuitry in one integrated circuit (IC). However, this technique decreases the image quality since a significant portion of the chip area is not light sensitive and cannot capture images. These ICs (also referred to as chips) typically include one processing engine per pixel that is physically formed on the chip adjacent or in the vicinity of that pixel (i.e., co-located with the pixel). Also, because these CCD chips are formed on a DRAM-type process, the speed of the processing engine of such chips is often limited, particularly, if the processing engines were configured to be software programmable. Moreover, the optical resolution of these chips deteriorates as the size of the co-located processing engines increases. Furthermore, these chips are only capable of capturing images in the range of the visible light, and thus fail to capture and process images generated by SONAR, RADAR, or light detection and ranging (LIDAR), and all other frequency ranges of the electromagnetic spectrum of waves.
FIG. 2 shows processing image layer 50 and 55, in accordance with conventional image processing techniques. As known to those skilled in the art, layer 50 is adapted to perform object recognition and association. Layer 55 is adapted to perform feature extraction. Because, as seen from FIG. 2, the granularity of the layers is limited, prior art image processing systems are unable to map specific image processing tasks to hardware dedicated to perform these tasks.