Field of the Invention
The present invention relates to image processing, in particular, to object detection in video images, and more particular, to foreground/background separation.
Description of the Related Art
A video is a sequence of images. The images are also referred to as frames. The terms “frame” and “image” are used interchangeably throughout this specification to describe a single image in an image sequence.
Scene modelling, also known as background modelling, involves the modelling of the visual content of a scene, based on an image sequence depicting the scene. The content typically includes foreground content and background content, for which a distinction or separation of the two is often desired.
In the intelligent surveillance field, it is popular to use foreground/background separation to detect the foreground object in the scene. A scene is composed of several visual elements and each visual element may have several possible appearances. Visual elements may be, for example, pixels or 8*8 DCT (Discrete Cosine Transform) blocks, as used in JPEG images.
In one foreground object detecting method of the prior art, the foreground object is separated from the background by analyzing the appearance age of the visual elements. If the appearance age of the visual element in one state is greater than a predefined threshold, this visual element will be recognized as the background.
However, the above method cannot separate the moving foreground object from the stationary foreground object accurately, for example in the case that there is an abandoned object in the background as shown in FIGS. 1A-1C. As shown in FIG. 1A, a lobby is monitored. In FIG. 1B, a bag is abandoned on the floor. In FIG. 1C, a person is passing by the bag. It is expected that only the moving person will be identified as the foreground object, and the bag can be separated from the person. However, with the above method in the prior art, both the moving person and the bag will be detected as foreground, and will be outputted as one object. Therefore, an approach is needed to distinguish the moving foreground object from the stationary foreground object (such as the abandoned object), and furthermore, to find the border between them.
A method proposed in US2012/0163658 can resolve the problem that moving objects detection cannot separate the moving foreground object from stationary foreground object (abandoned object). This invention enables the separation of the moving foreground object from stationary foreground object in a short time using a less memory capacity. In the method of US2012/0163658, the moving foreground object can be separated from the stationary foreground object by analyzing the co-occurrence rate between the appearances of the visual element pair.
FIGS. 2A-2C show the principle of the method of US2012/0163658. As shown in FIG. 2 A, two adjoining visual elements A and B are selected, and current appearances of the visual elements A and B are identified as planes 1 and 1′, respectively. Since the appearances of the visual elements A and B are not changed in FIG. 2A, it is determined that these two visual elements A and B have a high co-occurrence rate, and thus they are connective. In FIG. 2B, a bag is abandoned on the floor. Both appearances of the visual elements A and B are changed to planes 2 and 2′ from planes 1 and 1′, respectively. Since the appearances of the visual elements A and B are both changed, it is determined that these two visual elements A and B have a high co-occurrence rate, and thus they are connective. In FIG. 2C, a person is passing by the bag. The appearance of visual element A is not changed, and maintains in plane 2. However, the appearance of visual element B is changed from plane 2′ to plane 3′, that is, a new appearance of visual element B will be identified. Accordingly, it is determined that these two visual elements A and B have a low co-occurrence rate, and thus they are un-connective. The border between the moving object and the stationary object can be thus determined.
FIG. 3 shows the flowchart of method of US2012/0163658 in the prior art. As shown in FIG. 3, in step 110, the co-occurrence rate of the appearances of the visual elements in each visual element pair is calculated. In step 120, the connection relationship between the visual elements in each visual element pair is determined based on the co-occurrence rate calculated in step 110. In step 130, the scene model is updated according to the connection relationships among the visual elements.
The method of US2012/0163658 can distinguish the moving foreground object from the stationary foreground object, but there are still some problems in this method. Because the determination in the method of US2012/0163658 depends on the accuracy of the co-occurrence information of adjoining visual elements' appearances, when the amount of noise is considerable, “Lack of segmentation” and “Over segmentation”, which are not desired, may happen. FIG. 4A shows the phenomenon of “Lack of segmentation” in the method of US2012/0163658 of the prior art. FIG. 4B shows the phenomenon of “Over segmentation” in the method of US2012/0163658 of the prior art.
As shown in FIG. 4A, because the leg of the person and the bag are in a similar color, when the leg of the person is adjacent to the bag, no new appearance will be identified for the corresponding visual element. Accordingly, the co-occurrence rate between appearances of the adjoining visual elements is high, and they will be determined as connected incorrectly. In other words, the border between the leg of the person and the bag cannot be distinguished. This issue is the so-called “Lack of segmentation”.
As shown in FIG. 4B, because the light changing brings different effects to different parts of the person, for example, in the upper half of the person, a new appearance for the visual element can be identified, but in the lower half of the person, no new appearance can be identified for the visual element. So the co-occurrence rate between appearances of the upper half and the lower half is low, and these two portions will be determined as un-connected incorrectly. This issue is the so-called “Over segmentation”.
Therefore, it is desired to propose a new technique to address at least one of the problems in the prior art.