1. Technical Field
The present invention relates to a method and system for detecting objects in images, and particularly, but not exclusively, to a method and system which detects moving objects taking into account lighting shadows and highlights of those objects.
2. Related Art
The automatic detection of moving objects such as people or vehicles within video images of a scene has been a goal of many researchers, and is a precursor to the provision of automated classification or tracking applications. Additionally, automated object detection systems are in themselves of use for monitoring and detection applications. The actual step of discriminating between picture elements (pixels) within an image which represent foreground or moving objects of interest and the background scene is known generally within the art and herein as “segmentation”.
One of the problems known in the art relating to the automated detection of objects is of how to compensate for changes in lighting conditions, and in particular of how to discern between an actual object, and a shadow or highlight that that object may cast or otherwise contain. Especially within object classification or tracking systems, the need to be reasonably certain that it is the object which has been detected and not its shadow is important for subsequent matching steps, and hence techniques have been proposed in the art which detect and remove segmented pixels caused by shadows and highlights.
More particularly, McKenna et al. in “Tracking Groups of People”, Computer Vision and Image Understanding, 80, 42-56, 2000 describe a pixel segmentation technique wherein an adaptive background image is employed, which recursively adapts a background image to take into account changes in illumination (which are assumed to be slow compared to object movement). A colour channel background subtraction technique is then performed, wherein for any particular input image, the RGB channels of the input image pixels are compared with the adaptive background, and dependent on the results of a logical comparison of the respective input and background R, G, or B values a pixel is set as either “foreground” or “background”. The map of “foreground” pixels constitutes a mask which is then used subsequently for further processing.
Due to it having been generated using a colour-difference background subtraction technique the mask of foreground pixels contains pixels which represent object shadows and/or highlights. Therefore, McKenna et al. also describe a second pixel segmentation technique wherein shadows are detected using pixel gradient and chrominance information. More particularly, and as described in Horpraset et al. “A Statistical Approach for Real-time Robust Background Subtraction and Shadow Detection” IEEE ICCV'99 FRAME—RATE workshop, it is known that shadows exhibit a colour constancy property in that the chromaticity of a pixel which is in shadow does not significantly differ from the chromaticity of that same pixel when it is not in shadow. Instead, the only change lies in the luminance of the pixel. This colour constancy is therefore used as a first discriminator by McKenna et al in that they assume that any pixel with a significant intensity change in comparison with the background, but without a significant chromaticity change could have been caused by a shadow.
However, McKenna et al. also note that the above colour constancy discriminator fails when a foreground object is the same chromaticity as the background e.g. when black trousers cross a grey footpath. Therefore, McKenna et al. also describe using pixel gradient information of the input and background images to perform a comparison there between, on the basis that if an input pixel is a shadow pixel, then its texture information should not have changed much from its background value. A pixel is thus flagged as foreground if either chromaticity or gradient information supports that classification.
Having performed pixel segmentation using the above technique, McKenna et al then perform a connected component analysis to identify connected objects. They note that due to the chromaticity and gradient discrimination a connected object may contain “holes”, in that pixels wholly located within the boundaries of the connected pixels and which should have been segmented as foreground are erroneously segmented as background, and hence not part of the connected object. To remedy this, reference is made to the “mask” produced by the RGB subtraction method described earlier, in that each “background” pixel which is part of a hole in a connected object is set to foreground if the mask from the RGB subtraction indicates that it is foreground. Thus the holes within the connected objects may be removed, and foreground and background segmentation performed which takes into account shadows.
Whilst the colour constancy and gradient techniques described by McKenna et al. are effective at identifying shadows for the purposes of a foreground-background pixel segmentation, the technique described by McKenna et al presents some problems when used in real situations due to the connected component analysis being performed before taking into account the “holes” due to the segmentation. For example, if the “holes” due to the segmentation actually overlap and completely bisect an object, then the connected component analysis will not recognise the two blobs as a single object, but instead as two separate objects. As a consequence, overlapping “holes” will not be recognised as such, and hence cannot be filled in by reference to the RGB subtraction mask. Conversely, until the connected component analysis is performed the “holes” themselves are not identified, and hence filled. The result of this paradox is that in some situations two smaller connected objects may be identified where in reality only one object exists.