In computer vision systems, “blobs”, also referred to as patches or regions, are commonly used to identify and represent foreground objects in a scene, e.g., the content of a frame of a video sequence. In many such systems, a background subtraction technique is used to identify pixels that belong to foreground objects in the scene. These foreground pixels are then grouped together using connected components labeling to form the blobs, or image regions of contiguous pixels. Further, a blob is often characterized by its centroid, e.g., the average x and y position of pixels in the blob, and bounding box. The identified blobs may then be used in further analysis such as, for example, for object tracking in video analytic systems, where a unique identity for each blob in a scene is maintained. That is, given an initial assignment of labels to blobs, an object tracking technique attempts to find a correspondence between the blobs in frame It and the blobs in frame It+1.
Establishing a correspondence typically involves comparing the centroid locations, bounding box sizes, etc. of the blobs in frame It with each of the blobs in frame It+1. However, when two or more foreground objects appearing in a sequence of frames come in sufficiently close proximity in a frame, their corresponding blobs are merged and represented as a single blob entity with a single centroid and a single bounding box. Thus, the tracking technique may no longer be able to reliably track the two or more foreground objects in subsequent frames, even when the foreground objects are no longer in close proximity. Accordingly, improvements in blob representation are desirable.