Analysis of dynamic scenes to obtain motion parameters, structures and identities of the objects present in the scene, and/or description of the events taking place in the scene, is an important problem in machine vision. Dynamic scene analysis involves processing and analysis of a sequence of image frames (snapshots of a scene) taken at a regularly spaced time interval. Since a large volume of data is generally required to be processed, dynamic scene analysis techniques are, in general, computationally very intensive which makes them unsuitable for applications that require real-time response.
One type of dynamic scene analysis using the difference picture-based technique has been proposed by Jain (See R. Jain, "Extraction of Motion Information from Peripheral Processes," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-3, No. 5, 1981, pp 489-543; and R. Jain, "Dynamic Scene Analysis Using Pixel-Based Processes," Computer, August 1981, pp. 12-18). Jain's approach provides that for the two image frames, f1 and f2, the difference picture is a binary picture generated using the following definition: EQU D.sub.f1 f2 (i,j)=1 if .vertline.f1(i,j)-f2(i,j).vertline.&gt;=T (a pre-defined threshold)=0 otherwise.
In other words, the difference picture D.sub.f1,f2 (f.sub.i, f.sub.j) is a binary comparison of two image frames, f.sub.i and f.sub.j generated by placing a 1 in each position where the corresponding pixels in f.sub.i and f.sub.j have appreciable different gray level characteristics. If the gray levels of the corresponding pixels in the frames under comparison differ by more than a predetermined threshold, then the pixels are considered to be different. The value of the threshold is scene dependent.
To analyze a dynamic scene, the difference picture is generated for two image frames of the scene taken at contiguous time instants. The first frame is called the "previous" frame while the second frame is called the "current" frame. The entries in the difference pictures are assumed to be caused by motion in the scene. To reduce the influence of noise, it is usually assumed that in difference pictures, connected regions of size less than some threshold are due to noise and are ignored by the use of a size filter. This may result in ignoring regions containing slow-moving and/or small objects, but it ensures that the remaining regions in the difference picture are the result of motion. According to Jain's approach, motion related information about dynamic objects in a scene can be obtained by analyzing the characteristics of the connected regions in the corresponding difference picture.
According to Jain, a difference picture region is a set of 4 connected (or 8 connected) non-zero difference picture pixels. A non-zero pixel in the difference picture is considered an edge point if at least one of its neighbors is zero. A pixel in a previous or current frame is considered an edge point if the value of the Sobel operator at that point in the frame is above a threshold.
A previous frame edge picture is a binary picture having a 1 entry in those positions which correspond to edge points in the difference picture and in the previous frame. Similarly, a current frame edge picture has 1 entries in those pixel positions which are edge points in the difference picture and the current frame.
A labeled difference picture is a picture in which a nonedge point of the difference picture is marked 1 and an edge point of the difference picture is marked 2, 3, 4, or 5 if the corresponding points in neither previous nor current, previous, current or both previous and current frames are edge points, respectively. The edge points in the difference picture usually form a closed arc. This arc is called an edge segment in the difference picture. An edge segment can usually be partitioned into several arcs such that each arc comprises entirely previous frame or current frame edge points. Each such partition is called a fragment. The partition comprising previous frame edge points is called the previous frame fragment, while the partition comprising current frame edge points is called the current frame fragment. In a labeled difference picture, a previous frame edge fragment comprises points labeled 3 or 5, while the current frame edge fragment comprises points labeled 4 or 5.
The overall analysis requires computation of the following features for each region of the difference picture:
(1) N.sub.c, the number of fragments of difference picture edge points that are also edge points in the current frame. Such a fragment is called a current-frame edge fragment.
(2) N.sub.p, the number of fragments of difference picture edge points that are also edge points in the previous frame. Such a fragment is called a previous-frame edge fragment.
(3) C.sub.c, a Boolean value representing the closedness of the current-frame edge fragment. It is true when the current-frame edge fragment is closed and false otherwise.
(4) C.sub.p, a Boolean value representing the closedness of the previous frame edge fragment. It is true when the previous-frame edge fragment is closed and false otherwise.
(5) CURPRE, the ratio of current-frame edge points to previous-frame edge points of the region.
Based on these features, a region of the difference picture can be classified into one of the nine classes using a decision tree as shown in FIG. 1.
Where:
O=Covering of background by a moving object PA1 B=Uncovering of background by a moving object PA1 OC=Occlusion PA1 TSO=Translation of one moving object PA1 APP=Approaching object PA1 REC=Receeding object PA1 PFP=Previous frame position of a totally displaced object PA1 CFP=Current frame position of a totally displaced object PA1 X=Uncertain
Eight of these classes correspond to different motion situations. It should be noted that this is a peripheral phase technique.
Two different implementation strategies are known which improve the efficiency of Jain's approach. Agrawal and Jain have proposed a pseudo-parallel system architecture to implement Jain's difference picture based motion detection technique (See, D. P. Agrawal and R. Jain, "A Pipelined Pseudo-parallel System Architecture for real-time Dynamic Scene Analysis," IEEE Transactions. on Computers, Vol. C-31, No. 10, October 1982, pp. 952-962). The Agrawal and Jain approach involves 41 processors (CPU's or microprocessors) for the peripheral phase computing in Single Instruction Multiple Data (SIMD), Multiple Instruction Multiple Data (MIMD), and Single Instruction Single Data (SISD) modes which requires a large number of memory modules (roughly 150), with the size of modules varying from 1K to 14K words. Their system also requires a distributed operating system and involves tackling problems like task partitioning, assignment of memory modules and processors to the partitioned tasks, use of complex multistage interconnection networks, and on the whole a very complex control system potentially requiring several hundred VLSI chips.
The Agrawal and Jain approach overcomes some of the shortcomings of complex vision tasks that are not inherently parallel in nature. However, their approach is not without drawbacks. For example, the approach is uneconomic since the architecture requires enormous amount of hardware as described above. Additionally, the proposed system can be inefficient in terms of speed due to the large number of memory accesses and also the different levels of compilations and requirement of periodic interferences of the operating system.
Agrawal et al. have proposed the use of a multi-computer system where a multiple number of independent computers connected loosely via a communication network can be used for dynamic scene analysis (See D. P. Agrawal, V. K. Janakiram and G. C. Pathak, "Evaluating Performance of Multicomputer Configurations," Computer, Vol. 19, No. 5, May 1986, pp. 23-37). The Agrawal et al. approach considers various parallel architectures like Alpha network, mesh, hyper tree, fully connected network, cube and multi-tree structures and provides a performance comparison to study the suitability of these networks for dynamic scene analysis. However, this approach suffers from some of the same drawbacks described previously. For example, the Agrawal, et al. approach uses parallel medium-power processors arranged in close proximity which communicate via dedicated links or communication paths. These processors can be expensive and may not be able to obtain enough speed up in dynamic scene analysis computation to justify their use.