In the past, parallel computer systems have extensively been used to solve complex computational problems in less time. In parallel computer systems, a complex problem is partitioned into multiple smaller parts that can be attacked simultaneously. For example, a loosely coupled network of readily available low-cost computers were recently able to factor a 167 digit prime number in a matter of days. This was a task that many experts in the past said might take years to solve using traditional systems and methods.
Currently, there are a number of different efforts in progress to apply parallel computing techniques to complex real-time applications such as speech processing, robotics, and computer vision. On the hardware side, a broad variety of parallel architectures have been explored. Representative commercial systems include SIMD machines such as CM-2 and MasPar and systolic/data-flow machines such as the iWarp system. Experimental parallel computers include pyramid architectures such as the IUA and reconfigurable machines such as PASM and Proteus.
Each of these parallel architectures represents a particular viewpoint on the diverse requirements of parallel computing in, for example, automated vision systems. SIMD and data-flow architectures typically target low-level automated vision tasks such as histogramming, image smoothing, and convolution. Pyramid machines implement a hierarchical decomposition of vision problems in hardware. Reconfigurable machines explore the dynamic configuration of processing resources between low and high level vision tasks.
Today, commercial MIMD computers such as the Digital Equipment Corporation AlphaServer 4100, Silicon Graphics Origin 2000, and IBM SP-2 have become commonplace. These machines support task parallelism in which an application is divided into multiple interacting processes, or threads, which perform distinct tasks. Systems with four to eight processors are common and some can scale to hundreds of processors.
It is proposed that commercial MIMD offerings will continue to provide the most cost-effective path to increasing performance. Therefore, the question of how to best use these machines for complex computational tasks such as computer vision, which require synchronized processing of temporally ordered data, e.g., digitized frames of a video sequence, is addressed here.
Commercial MIMD computers promise cost-effective parallel processing for interactive vision applications, but programming MIMD computers is time-consuming and obtaining good performance is often difficult. Two major sources of difficulty are the synchronization and buffer management tasks required by the characteristic data flow in, for example, a vision application.
One prior art parallel technique, the Beehive system developed at the Georgia Institute of Technology, provides a software distributed shared memory system for transparent access to shared data in a cluster of Sun workstations. The application programming interface (API) of Beehive provides shared memory programming with synchronization primitives that have temporal correctness guarantees.
Beehive is particularly well-suited for applications that tolerate a certain amount of staleness in the global state information. Beehive has been used for real-time computation of computer graphical simulations of animated figures. As limitations, Beehive does not support variable access granularities for different data items manipulated by the application, nor does Beehive provide a multi-dimensional addressing capability, for example, in space and time. Moreover, Beehive does not provide atomicity for reading and writing variable sized data items.
The idea of a temporally ordered memory has also been used in optimistic distributed discrete-event simulation. In those systems, a space-time memory allows an application to "roll-back" to an earlier state when data items are received out of temporal order.
The processes used in complex real-time interactive applications, such as vision oriented user-interfaces, or robotics, typically follow a data flow model in which images acquired by digitizers go through several stages of processing, resulting in a control signal or some other output.
In a typical vision application, multiple moving objects are tracked in a scene. Frames of the video are compared with immediately previous frames to determine a moving region. Color histogram analysis of moving regions yield possible target locations. The peak location in the histogram corresponds to the object. This location is used to control the gaze direction of a displayed synthetic graphical agent. For this type of application, the speed and latency of the vision component of the system has a direct impact on its overall effectiveness. Parallel computing is necessary to meet the demanding computational and bandwidth requirements of vision algorithms and achieve high performance.
It is desired to provide parallel processing for a target architecture which includes a cluster of symmetric multi-processors (SMP's) connected together through a network. The parallel processing is to take place on data stored in shared memories of the SMP. It is also desired that the data can be addressed in multiple dimensions.