This invention relates to a method and apparatus for classifying an object in a video frame, the video frame comprising part of a video sequence. The invention also relates to a state-transitional object tracking scheme for determining the status of an object in a video frame.
Digital video processing is used in a wide range of applications. For example, modern video surveillance systems employ digital processing techniques to provide information concerning moving objects in the video. Such a system will typically comprise a video camera connected to a computer system via a direct or network link. The computer system runs software arranged to process and analyse video data supplied from the camera.
FIG. 1 is a block diagram showing the software-level stages of such a surveillance system. In the first stage 1, a background model is learned from an initial segment of video data. The background model typically comprises statistical information representing the relatively static background content. In this respect, it will be appreciated that a background scene will remain relatively stationary compared with objects in the foreground. In a second stage 3, foreground extraction and background adaptation is performed on each incoming video frame. The current frame is compared with the background model to estimate which pixels of the current frame represent foreground regions and which represent background. Small changes in the background model are also updated. In a third stage 5, objects, represented by the foreground regions, are tracked from frame to frame by identifying a correspondence between objects in the current frame and those tracked in previous frames. Meanwhile a trajectory database is updated so that the tracking history of each object is available to higher-level applications 7 which may, for example, perform behavioural analysis on one or more of the tracked objects.
After processing each video frame, a validity check 9 is performed on the background model to determine whether it is still valid. Significant or sudden changes in the captured scene may require initialisation of a new background model by returning to the first stage 1.
A known intelligent video system is disclosed in US Patent Application Publication No. 2003/0053659 A1. A known foreground extraction and tracking method is disclosed by Stauffer and Grimson in “Learning Patterns of Activity using Real-Time Tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 22, No. 8, August 2000.
Conventionally, the object tracking stage 5 operates on the assumption that the output from the foreground extraction stage 3 includes no noise or spurious artefacts, and that the motion of detected objects will be relatively simple. In practice, however, objects can appear and disappear at any place and at any time in a video sequence, especially if there is noise present in the system. Objects may disappear for a small number of frames and then reappear some time later. It is also possible for multiple objects to move across a scene of the video sequence, occlude one another, and then split apart. These situations can prevent the object tracking stage 5 making a correspondence between foreground regions in the current frame and those already identified in previous frames. As a result, tracking is lost and the tracking history is not accurate.
According to a first aspect of the present invention, there is provided a method for tracking an object appearing in a video sequence comprising a plurality of frames, each frame comprising a plurality of pixels, the method comprising; (a) identifying a first object in a first frame and associating therewith a first status parameter indicative of a non-tracking condition; (b) identifying a candidate object in a subsequent frame and determining whether there is a correspondence between the candidate object and the first object; (c) in the event of correspondence, repeating steps (b) and (c) for further subsequent frames until a predetermined number of sequential correspondences are identified; (d) changing the first status parameter to a second status parameter when said sequential correspondences are identified; and (e) in response to the change from first to second status parameter, recording the intra-frame position of said object for subsequent frames.
Preferred features of the invention are defined in the dependent claims appended hereto.
According to a further aspect of the invention, three is provided a video processing system for tracking an object appearing in a video sequence comprising a plurality of frames, each frame comprising a plurality of pixels, the system being arranged, in use, to; (a) identify a first object in a first frame and associating therewith a first status parameter indicative of a non-tracking condition; (b) identify a candidate object in a subsequent frame and determining whether there is a correspondence between the candidate object and the first object; (c) in the event of correspondence, repeat steps (b) and (c) for further subsequent frames until a predetermined number of sequential correspondences are identified; (d) change the first status parameter to a second status parameter when said sequential correspondences are identified; and (e) in response to the change from first to second status parameter, record the intra-frame position of said object for subsequent frames.
According to a further aspect of the invention, there is provided a method of classifying an object in a video sequence comprising a plurality of frames, the method comprising; (a) identifying a first object in a first frame and associating therewith a status parameter having one of a plurality of predetermined states, each state having a different transition rule associated therewith; (b) identifying at least one candidate object in a subsequent frame; (c) comparing the or each candidate object with the first object to determine if there is a correspondence therebetween; and (d) updating the status parameter of the first object in accordance with its associated transition rule, said transition rule indicating which of the predetermined states the status parameter should be transited to dependent on whether a correspondence was identified in step (c).
According to a further aspect of the invention, there is provided a method of classifying an object in a video frame comprising part of a video sequence, the method comprising; (a) identifying a first object in a first frame and associating therewith a status parameter having one of a plurality of predetermined states, each state having a different transition rule associated therewith; (b) identifying at least one candidate object in a subsequent frame; (c) comparing the or each candidate object with the first object to determine if there is a correspondence therebetween; and (d) updating the status parameter of the first object in accordance with its associated transition rule, said transition rule indicating which of the predetermined states the status parameter should be transited to dependent on whether a correspondence was identified in step (c).
By classifying an object as being in a particular state, it is possible to decide whether or not that object should be tracked. A predefined rule associated with the object is applied to determine the object's updated state following comparison with a candidate object in a subsequent frame. The updated state may reflect, for example, that the object is new, real, occluded or has disappeared from the subsequent frame, so that an appropriate rule can be applied when the next frame is received.
The method may further comprise repeating steps (b) to (d) for a plurality of subsequent frames of the video sequence.
The transition rule associated with the state may causes the status parameter to maintain its current state if there is no correspondence identified in step (c). The status parameter may have a new state or a real state, the transition rule associated with the new state causing the status parameter to be changed to the real state in the event that a correspondence is identified in step (c). The method may further comprise recording the position change between the first object and the corresponding candidate object only when the status parameter is in the real state.
The status parameter can be changed to the real state only if a correspondence is identified in a plurality of sequential frames in step (c).
The status parameter may be changed to the real state only if (i) a correspondence is identified in step (c) continuously for a predetermined time period, and (ii) extracted position characteristics of the object meet a set of predefined criteria. Step (ii) can comprise assigning a motion factor ζm to the first region based on its position characteristics over a plurality of video frames, and classifying said first object as meeting the predefined criteria if the motion factor is above a predetermined threshold Tζ. The motion factor ζm may be given by:
      Ϛ    m    =            (                                    σ            cx            2                                              σ              vx              2                        +            τ                          +                              σ            cy            2                                              σ              vy              2                        +            τ                              )        /    2  where σcx2 and σcy2 are the positional variances of the first object in the x and y directions, respectively, σvx2 and σvy2 are the velocity variances in x and y directions, respectively, and τ is a predetermined constant.
The method may further comprise displaying the corresponding candidate object in said subsequent frame together with an overlaid path line indicating the recorded position change between the first object and the corresponding candidate object. The status parameter may also have an occluded state, the transition rule associated with
the real state causing the status parameter to be changed to the occluded state in the event that no correspondence is identified in step (c) and the first object overlaps a different region of interest appearing in the same frame.
The method may further comprise providing a first set of data representing appearance features fio of the first object, and extracting, for the or each candidate region in the subsequent frame, a second set of data representing appearance features fib of that respective object, step (c) comprising combining the first set of appearance data with the or each second set of appearance data in a cost function Cob thereby to generate a numerical parameter indicating the degree of correspondence between the first object and the or each new candidate region.
The cost function can be given by the following expression:
      C    ob    =                    ∑                  i          =          1                n            ⁢                                    (                                          f                i                o                            -                              f                i                b                                      )                    2                          σ          i          2                    where fio represents an appearance feature of the first object, fib represents an appearance feature of the candidate region, σi2is the variance of fio over a predetermined number of frames and n is the number of appearance features in the first and second data sets.
The appearance features of the first object and the candidate object may include features relating to the frame position of the object and candidate object, and features relating to the shape of the object and candidate object. In the event of a correspondence being identified in step (c), the appearance features fio of the first object may be updated using the appearance features fib of the candidate object to which the first object corresponds.
According to a further aspect of the invention, there is provided a computer program stored on a computer-readable medium and comprising a set of instructions to cause a computer to perform the steps of (a) identifying a first object in a first frame and associating therewith a status parameter having one of a plurality of predetermined states, each state having a different transition rule associated therewith; (b) identifying at least one candidate object in a subsequent frame; (c) comparing the or each candidate object with the first object to determine if there is a correspondence therebetween; and (d) updating the status parameter of the first object in accordance with its associated transition rule, said transition rule indicating which of the predetermined states the status parameter should be transited to dependent on whether a correspondence was identified in step (c).
According to a further aspect of the invention, there is provided video processing apparatus comprising: an input for receiving frames of a video sequence; an object queue arranged to store data representing objects identified in one or more frames and, associated with each object, a status parameter representing one of a plurality of predetermined states; video analysis means arranged to receive a frame having one or more candidate objects therein, and to determine whether the or each candidate object matches an object in the object queue; and a state transition controller arranged to update the status parameter of each object in the object queue in accordance with a transition rule corresponding to the state of the object's status parameter, said transition rule indicating which of the predetermined states the status parameter should be transited to.
According to a further aspect of the invention, there is provided a state transition database for use with a video processing system storing information representing one or more objects identified in a video frame, each object being classified to one of a plurality of predefined object states, the database defining a plurality of different transition rules for each respective object state, the transition rules defining two or more updated states to which the object is transited depending on whether the object is matched with an object appearing in a subsequent frame.