Constraints that are encountered in Single Instruction-Multiple Datastream (SIMD) and Multiple Instruction-Multiple Datastream (MIMD) parallel processing systems include the following.
In SIMD systems, a partitioning of an input data segment makes it difficult to achieve proper processor load balancing during parallel execution of the data segments. Also, an incremental data bandwidth requirement on the input data path becomes a design burden when it is desired to increase the number of parallel processors in the SIMD system.
In MIMD systems, complexities typically exist in data-flow communication and synchronization amongst the plurality of processors.
These problems become especially apparent when blocks of image data are being processed, such as when image data is being compressed (encoded) or decompressed (decoded). In that a source of image data, such as camera, or a sink for image data, such as a display monitor, operate in real time with fixed image data transfer rates, the image data processing system should be capable of keeping pace with the required source or sink image data transfer rate. That is, the image data processing system must be able to process the image data at a rate that prevents a loss of input image data, and/or that prevents the generation of undesirable visual artifacts when the image data is displayed.
To obtain an optimal processing performance from a process-pipeline architecture, the pipeline data-flow should be maintained as full as possible. However, this becomes difficult due to data-flow fluctuations that typically occur during the operation of such systems. Also, overall supervision of pipeline control can represent a costly overhead factor to a host system or a dedicated supervisory co-processor.
U.S. Pat. No. 5,046,080, issued Sep. 3, 1991, entitled "Video CODEC Including Pipelined Processing Elements" (J. S. Lee et al.) describes a videophone system that provides videophone service within a narrow band digital network. A host CPU 20 is coupled through a bus 21 and a common memory 24 to an image bus 28. The CODEC 25 is comprised of a plurality of processing elements (PEs) connected as a pipeline. Each PE includes a digital signal processor (DSP) and performs a part of the function of image coding. FIFO memory modules 64 are comprised of a flag circuit and FIFO memory. The flag circuit controls data flow between processing elements. The common memory is employed to store messages and performs synchronization between processing elements. No methodology is described for optimizing a pipeline architecture so as to efficiently operate with blocks of data.
It is thus an object of this invention to provide a process-pipeline architecture wherein data buffering and local control functions are inserted within the process-pipeline at the boundaries of discrete process-pipeline functions.
It is a further object of this invention to provide a process-pipeline architecture wherein discrete stages of the pipeline include local data and possibly address buffers, and a local controller to supervise local synchronization and timing.
It is one further object of this invention to provide a system for encoding and decoding blocks of video/image data, and to provide a methodology to optimize a pipeline architecture.