There exists a need to identify, recognize, count and/or track the movement of objects in an environment. For example, there is a need to determine how long people spend in retail stores or other commercial environments. When an object (e.g. a person, animal, vehicle or shopping cart) is tracked, useful information can be generated. For example, in a retail store application, useful information may be generated by tracking the movements of individuals through the store. Such useful information may include, for example: time that people spend viewing an advertisement or promotional display; the ratio of adults to children in a particular area; the number of people that view a promotional display today as compared to the same promotional display yesterday or last week; the duration of time that shoppers spend in a particular area; and the like. Such information can be helpful to generate additional information, such as desirable staffing allocations, traffic counts, shopper-to-purchaser ratios, etc.
Various prior art techniques have been proposed and developed to automate the counting and/or tracking of objects. Often, the applicability of these techniques depends on the environment in which the counting and/or tracking takes place.
One common technique for counting the movement of objects into or out of an environment involves the use of infrared beams. An infrared beam is directed from a source located on one side of an opening, such as a door, to a detector positioned on an opposite side of the opening. When the beam is interrupted by the movement of an object through its path, the system detects the passage of an object through the environment. Techniques of this type are not capable of detecting the direction of movement of objects through the environment, or identifying, recognizing or tracking particular objects.
Monocular vision systems comprising a video camera and a video processor can identify, detect and track the movements of objects. Such monocular vision systems identify and detect motion by monitoring successive image frames and recording the positions of distinctive groups of pixels. These distinctive groups of pixels may be identified as objects of interest that are found in successive image frames. If the position of a particular group of pixels changes between image frames, then the vision system detects movement of the corresponding object. Such systems can also track the motion of an object by analyzing successive image frames. The relative position of the group of pixels between successive image frames can be interpreted as movement of the associated object relative to the stationary background. The movement of the object may be tracked by recording the position of its group of pixels between successive image frames.
In operation, however, monocular vision systems are not very robust and can make mistakes in relation to the movement and/or identification of objects. In addition, monocular vision systems are limited to the detection of two-dimensional object features.
Stereo vision cameras, which generally comprise two (or more) cameras and a video processor, are also used for object identification, recognition and tracking applications. Such stereo vision cameras can detect three-dimensional properties of objects in their stereo vision field of view. Examples of commercially available stereo vision cameras include the DIGICLOPS™ and the BUMBLEBEE™ camera systems available from Point Grey Research Inc. of Vancouver, Canada. In addition to object identification, recognition and tracking, stereo vision cameras are used for a wide variety of applications, such as computer vision and object dimensioning.
A typical stereo vision camera comprises two spaced-apart monocular cameras. Some prior art stereo vision cameras have three or more monocular cameras. FIG. 1 shows a stereo vision camera 10 having two monocular cameras 11A and 11B. Preferably, although not necessarily, monocular cameras 11A and 11B are digital cameras. The distance b between camera 11A and camera 11B is referred to as the “baseline”. Each of cameras 11A and 11B has an associated optical axis 16A and 16B and an associated field of view 12A and 12B. These fields of view 12A and 12B overlap one another in region 13, which is referred to as the “stereo vision field” or the “stereo vision field of view”. In a stereo vision camera having three or more monocular cameras, the system's stereo vision field includes any region where the fields of view of two or more monocular cameras overlap. Stereo vision camera 10 comprises a processor 14, which receives image data from each monocular camera 11A and 11B. Using standard triangulation techniques and/or other well-known stereo vision techniques, processor 14 can determine three-dimensional features of an object (i.e. object 15) in the stereo vision field.
Stereo vision camera systems have been used to implement prior art tracking techniques. According to these prior art techniques, an environment is populated with a plurality of cameras having overlapping fields of view, such that the entire environment of interest is located within the system's stereo vision field of view. Tracking is then performed in a manner similar to that of monocular vision systems, except that three-dimensional features of the image data and three-dimensional object features may be used to identify, recognize and track the motion of objects.
These prior art stereo vision tracking techniques suffer from the disadvantage that the entire environment of interest must be within the system's stereo vision field. When the environment of interest is large, such tracking techniques can be prohibitively expensive, because a large number of cameras is required to provide adequate coverage of the environment of interest.
There is a need for cost effective techniques for tracking the movement of objects, such as people, animals, vehicles and/or other moveable objects, into and out of environments.