Systems and methods herein generally relate to processing items in video frames obtained using a camera system, and more particularly to image processors that discriminate between background and foreground items within such video frames, without using substantial background modeling processes.
Video-based detection of moving and foreground objects in video acquired by stationary cameras is a core computer vision task. Temporal differencing of video frames is often used to detect objects in motion, but fails to detect slow-moving (relative to frame rate) or stationary objects. Background estimation and subtraction, on the other hand, can detect both moving and stationary foreground objects, but is typically more computationally expensive (both in terms of computing and memory resources) than frame differencing. Background estimation techniques construct and maintain statistical models describing background pixel behavior. According to this approach, a historical statistical model (e.g., a parametric density model such as a Gaussian Mixture Model (GMM), or a non-parametric density model such as a kernel-based estimate) for each pixel is constructed and updated continuously with each incoming frame at a rate controlled by a predetermined learning rate factor. Foreground detection is performed by determining a measure of fit of each pixel value in the incoming frame relative to its constructed statistical model: pixels that do not fit their corresponding background model are considered foreground pixels.
This approach has numerous limitations, including the requirement for computational and storage resources, the fact that the model takes time to converge, and the fact that there are many parameters to tune (e.g., the learning rate, the goodness-of-fit threshold, the number of components in each mixture model, etc.). Once a set of parameters is chosen, the latitude of scenarios supported by the model-based methods is limited; for example, too slow a learning rate would mean that the background estimate cannot adapt quickly enough to fast changes in the appearance of the scene; conversely, too fast a learning rate would cause objects that stay stationary for long periods to be absorbed into the background estimate.