The present exemplary embodiment generally relates to processing a target image using a background model corresponding to the scene depicted by the target image. The exemplary embodiment finds particular application in identifying shadow portions in the target image. A preprocessing step performed prior to foreground estimation via background subtraction modifies the target image or the background estimate according to a modification identification algorithm. The modified images are such that shadow portions won't be detected as foreground after background subtraction is performed.
Video-based detection of moving and foreground objects is typically a prerequisite of video-based object tracking. Foreground objects are moving and stationary non-permanent objects in a scene, where non-permanent is relative to general understanding of a constant state. For instance, moving people in a scene are foreground objects. Another example is luggage left in a terminal, which, while static over possibly multiple frames of video, is not part of the general understanding of the constant state of the terminal, and thus are also foreground objects. “Background” is the stationary content in a scene relative to general understanding of constant state. Some variability in time can be included in the definition of background. For instance, camera shake and swaying trees have dynamic effects on acquired images, but these effects are typically included in the background.
One of the problems in computer vision applications, such as surveillance, traffic monitoring, patient care, etc., is detection of foreground objects, particularly moving and stationary non-permanent objects, in a manner that is sufficiently reliable that also makes efficient use of resources. Examples of scenarios that rely on object detection and tracking include video-based vehicle speed estimation, automated parking monitoring, tracking people, and measuring total experience time in retail spaces. A key challenge associated with moving object detection is identifying shadows casted by the objects and moving along with them across the scene. Shadows may cause problems when segmenting and extracting foreground objects due to the potential misclassification of shadow pixels as foreground pixels. Although shadow detection algorithms have been widely explored in the literature, most existing work makes limiting assumptions about the scene, requires a number of key parameters to be fine-tuned based on the particulars of the specific application, and considers shadow detection as a separate step from background subtraction. The limiting assumptions are a barrier to broad deployment. Furthermore, the typical processes result in computational complexity that grows as the number of originally detected foreground pixels increases, which is usually the case in high vehicular or pedestrian traffic conditions.
In applications that perform analytics from video captured using fixed cameras (e.g., stationary surveillance cameras), the two most commonly used methods of foreground and motion detection are frame-to-frame differencing (strictly for motion detection) and background estimation and subtraction (for foreground and motion detection). The frame differencing method typically requires tuning to a very narrow range of object speed relative to the frame rate and camera geometry. On the other hand, background estimation and subtraction is more flexible in terms of adjustment of the time scale and dynamically adjusting parameters in the background modeling. Strictly speaking, a background subtraction method detects foreground objects rather than moving objects; however, moving objects will trigger foreground detection as their appearance differs from the appearance of the background estimate. In other words, moving objects are also considered foreground objects. This is in contrast with frame-differencing methods, which are only capable of moving object detection.
Background subtraction segments the foreground by comparing or subtracting the estimated background image from the current video frame. Significant differences in the resulting difference image usually correspond to foreground objects. Strategies used to maintain a background model or estimate include running averages, Gaussian mixture models, median filtering, etc. A background model is a mathematical or algorithmic framework used to estimate the background of a scene at a given time. This estimate of the background can be referred to synonymously as the background estimate or the background model because, in addition as a means to estimate, the mathematical/algorithmic is an alternative representation of the estimate.
Ideally, background subtraction should detect real foreground objects with high accuracy, limiting false negatives as much as possible; at the same time, it should extract pixels of foreground objects with the maximum responsiveness possible, avoiding detection of spurious objects, such as cast shadows, or noise. One of the main challenges associated with foreground or moving object detection is identifying shadows casted by objects and moving along with them across the scene. Shadows can pose serious challenges when segmenting and extracting moving objects due to the potential misclassification of shadow pixels as foreground pixels. The difficulties associated with shadow detection arise since shadows and objects share two important visual features. First, shadow points are detectable as foreground points since they typically differ significantly from the background. Second, shadows have similar patterns of motion as the objects casting them. Shadows can be especially problematic when they touch other moving objects in the scene, thereby making it difficult to identify distinct moving objects. For the above reasons, shadow identification has become an active research area.
Shadow detection algorithms have been widely explored in the literature. For additional information on shadow detection algorithms, see, e.g., Prati et al., Detecting Moving Shadows: Algorithms and Evaluation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 7, July 2003, pp. 918-923. Existing approaches (deterministic or statistical, model- or non-model-based in terms of the decision process) exploit a large set of assumptions to limit complexity and to avoid being unduly constrained to a specific scene model. These existing approaches rely on carefully selected parameters either in the shadow model or the decision threshold. This limits their shadow detection accuracy as well as extendibility.
Moreover, the existing work considers shadow detection separate from background subtraction. Specifically, traditional background subtraction methods extract a foreground that includes both real moving objects and shadows and then apply a shadow detection criterion on the detected foreground area to remove shadow pixels that were extracted as foreground pixels during the background subtraction. The computational complexity of this approach grows as the number of (originally) detected foreground pixels increases (which is usually the case in scenes with high vehicular or pedestrian traffic).