1. Field of the Invention
The present invention relates to a parallel-pipelined vision system for real-time video processing and a corresponding method for processing video signals, and more particularly, to such a system which has been designed to be modular and readily scalable for different applications.
2. Description of the Prior Art
Real-time video processing applications are continuing to grow as the capabilities of video processing systems increase. Typical examples of applications of video processing include autonomous vehicle navigation, vehicle anti-collision systems, video surveillance (both manned and unmanned), automatic target recognition, industrial inspection, and electronic video stabilization and mosaic construction. The common requirement for all of these applications is a need to be able to process the video imagery at rates suitable for processing live video. Vehicular obstacle avoidance, for example, requires very fast feedback of obstacle positions to the vehicle navigation system so that the vehicle can be guided around the impending obstacle. Other applications, such as video surveillance and electronic video stabilization, require output that is provided at full video rates for viewing by an operator.
The amount of data to be processed for real-time imaging can easily get into the billions of operations per second. Standard video provides approximately 30 Mbytes/sec of video information that needs to be processed. Thus, even just 50 operations per pixel, which is a modest amount of processing on a per-pixel basis, results in over 1.5 billion operations per second to be performed to process a standard video stream. Standard personal computers and workstations are incapable of providing video processing at such real-time rates. The processing capabilities of these systems are typically an order of magnitude or more lower than required, and the internal bussing of the standard workstation is incapable of providing the data throughput required for both receiving a digital video signal and providing a full-resolution video display output. As a result, dedicated pipelined video processing systems have been designed by a number of different manufacturers to provide real-time performance.
For example, the Max Video 250 system, developed and distributed by Datacube, is one example of a system capable of processing multiple video channels, each of which can process pixels at 20 Mpixels/sec video rates. However, the MV250 only contains a single filtering element, while many are often required for real-time multi-resolution image processing. The MV250 also provides limited capability of operating on multi-resolution images, because it has only five timing channels, while there are 32 input and output paths to the video crossbar switch. In addition, a significant amount of control overhead is required to calculate the appropriate delays for each processing path. Moreover, there are only a limited amount of video paths available between the video processing motherboards, and the five timing channels available on one board are shared among all processing boards. As a result, scaling up of the multi-resolution processing becomes very cumbersome, and use of the available resources is inefficient.
The MV250 also relies on using standard general purpose processor boards for system control and, therefore, does not provide a means to transfer processed video data efficiently to the general purpose processor for higher level video data processing. Datacube has provided specific solutions to add general purpose processing capability to the system with fast video data stream transfer by providing special purpose DSP boards with such a video interface (e.g., an 1960 based processing board); however, in that case, the control of the video hardware and the high level processing of video data is decoupled, significantly reducing the efficiency of algorithm implementations and adding additional control software and control time to synchronize the tasks.
The Imaging Technologies 150/40 is another example of a parallel-pipelined system, which is capable of processing 40 Mpixels/sec, also through multiple channels simultaneously. This system has similar limitations as the MV250 except that the distribution of video data paths within the video processing boards and among the video processing boards is even more limited.
Effective real-time video processing by such dedicated pipelined video processing systems requires the efficient execution of a number of operations referred to as "front-end" operations. Real-time video processing relies heavily on the operation of these front-end operations to perform higher-level operations at the desired rates. One of the principal front-end operations is the generation of multiresolution image representations. These representations, referred to commonly as image pyramids, involve decomposing an original image at its full resolution into successive representations at lower spatial resolutions. This is performed through iteratively filtering the image and subsampling the filtered results. The most common of the pyramids, called the Gaussian pyramid, involves successively low-pass filtering and decimating the original image, providing a sequence of smaller and smaller images that represent image features at lower and lower spatial resolutions. A pyramid processor integrated circuit which provides such pyramid filtering has been described, for example, by one of the present inventors in U.S. Pat. No. 5,359,674 and U.S. patent application Ser. No. 08/838,096.
An efficient real-time video processing system must be able to perform front-end operations at real-time video rates and to provide the results of the front-end processing to general-purpose processors, which analyze the results and make decisions based on the results. However, the higher-level operations subsequent to the front-end processes are typically very much application specific and significantly more complex. This makes the higher-level operations less suitable for optimization in hardware. The front-end processes, on the other hand, are ubiquitous and should be efficiently implementable in hardware.
Based on these considerations, the following list of features can be used to define an effective real-time video processing system:
Fast convolution and pyramid generation. This includes the generation of Gaussian and Laplacian pyramids, as well as gradient filters and other generally-applicable filtering operations. PA1 Reconfigurable arithmetic logic units. Image pointwise addition, subtraction, multiplication, and other more arbitrary operations are very common with front-end processes. PA1 Look-up table operations. These operations involve single-image transformations that are performed pointwise on the image. Adding gain to an image, inverting an image, scaling an image, thresholding an image, and other such functions are typical of look-up table transformations. PA1 Efficient parallel architecture. This describes the ability to use multiple components in parallel in an efficient way. Processing resources are not useful if they cannot flexibly be used while other processing resources are busy. PA1 Fast transferal of video data to general-purpose processors. When image data must be analyzed by a DSP or general-purpose microprocessor, the image data must be quickly accessible. PA1 High-level hardware control. A reentrable, multitasking environment must be available for hardware control to achieve maximum efficiency and programmability.
A real-time video processing system developed by the present assignee and known as the Sensar VFE-100 has been developed in view of these considerations. The VFE-100 provides real-time image stabilization, motion tracking, change detection, stereo vision, fast search for objects of interest in a scene, robotic guidance, and the like by focusing on the critical elements in each scene using the pyramid filtering technique to perform initial processing at reduced resolution and sample density and then progressively refining the processing at higher resolutions as needed.
Video image processing by the VFE-100 occurs in three stages: signal transformation, signal selection, and attribute estimation. Two basic signal transformations are supported. First, an image warp brings pairs of images into a common coordinate system (motion, stereo or observed and reference images) so that subsequent processes can be uniform and local. Second, the pyramid transform decomposes the image signals into band-pass components in the spatial domain. Signal selection is performed in the pyramid transform domain. The location and resolution of data used in subsequent analysis is controlled by selecting those data from an appropriate window at an appropriate level of the pyramid. Selected data are then processed to obtain estimates of the attributes of interest. These include displacement vectors in the case of motion or stereo, or texture or feature energy for altering and orienting. The transformed data are formatted to suit analysis through the application of compact filters which can describe measures such as local correlation, variance, texture energy, and local moments.
The VFE-100 is designed as a general purpose computing engine for real-time vision applications. Pipeline processing is performed as image data flow through a sequence of elements, and data flow paths and processing elements can be reconfigured to serve a wide variety of tasks. Once configured, a sequence of steps can be performed for an entire image or a sequence of images without external control. It is now desired to improve upon the VFE-100 by designing such a real-time video processing system such that it is modular and can be scaled smoothly from a relatively small system with modest amounts of hardware to a very large, very powerful system with significantly more hardware. In particular, it is desired to design a modular real-time video processing system which may be custom tailored to specific applications through the addition of new processing components, and the reconfiguration of available devices. The present invention has been designed to meet these needs in the art.