Identifying moving objects in video data is an important task in many computer-vision applications, such as video surveillance, traffic monitoring and analysis, human detection and tracking, and gesture recognition. Often in order to identify objects these applications use background discrimination, where the objects of interest (e.g., those in the foreground) are isolated from objects that are not material to analysis (e.g., those in the background). However, applications that rely on background discrimination tend to be computationally intensive, difficult to implement with only a single camera in real time, and potentially unreliable.
Other known computer vision applications use complex and costly three dimensional or depth oriented sensor systems that detect the shape of an object in order to identify it. Unfortunately, when these systems are used, the sensors must be placed at a certain distance for there to be proper identification, and they poorly recognize objects when the objects are placed in different environments or at distances for which the cameras are not configured, for example, long distances (e.g., 20 M or greater) or short distances (e.g., 50 cm or less). Moreover, at the present time, good two-dimensional cameras have a resolution of only 1920×1024, and depth oriented cameras have a resolution of only 320×240. Additionally, the power consumption of depth oriented cameras can be very high if for example, they use a time of flight methodology, because this requires high speed electronics, which are known to require significant energy.
Still other computer vision applications use special lighting, background, or clothing (e.g., gloves, patches, etc.) to enhance tracking of an object to be identified. However, even with these aids the applications are fraught with false positive results and often misidentify an object and its position in video data. Further, in the case where special clothing is used, the user may not be compliant.
The limits on known technologies often require a tradeoff to be made between the competing goals of capturing all objects for which the system is looking, while not registering a positive result for objects for which the system is not looking. As persons of ordinary skill in the art will recognize, with known technologies increasing the sharpness of discrimination of detection may cause a system to miss a candidate sought to be detected; while increasing the loose aspect of detection may create too many false detections. False detection may occur, for example, because of a superficial similarity to an object intended to be detected. An example of false detection is the coincident appearance of the shadow of a hand and the hand itself. A missed detection may occur, when a condition in the frame renders detection of the object difficult. For example, when there is a change in lighting conditions, or noise is introduced in the image, such as leaves moving in the background of the image on a windy day, there may be impedance of detection of the target object.
As the foregoing illustrates, there is a need for improved image based operating systems and methods that consistently and reliably identify objects in video data.