The present invention relates to identifying or re-identifying objects in a series of images captured by a camera, and more specifically to training a neural network to select a best image among a series of images for identification or re-identification purposes.
Monitoring cameras are used in many different applications, both indoors and outdoors, to monitor a variety of environments. One important aspect of the monitoring relates to the ability to identify or re-identify objects or people. For example, for law enforcement purposes, it may be of interest to identify whether a particular car enters or exists a monitored area, or determining whether images of two vehicles recorded at different times or places are actually the same vehicle. Similarly, it may be of interest to be able to identify specific persons or faces, both to make an initial determination as to who the person might be, or to determine whether the person has been seen before in a different monitoring situation.
There are two main categories of monitoring cameras; single-shot cameras and video cameras. The single-shot cameras typically take a single picture when triggered by some kind of sensor, such as a motion sensor, whereas video cameras continuously record the movement of an object as a series of images, also referred to as an “object track.”
Performing re-identification based on images from single-shot cameras typically involve using exactly one image crop of an object or person to be matched to another single image crop, to check whether both image crops represent the same person. An image crop, as used herein, refers to a region of an image recorded by a camera. Typically, the image crop is done in such a way that the image crop mainly contains pixels depicting the object or person.
However, in a video surveillance situation, selecting images for re-identification of an object becomes more complex. First, it is necessary to determine which collection of image crops belongs to the same (yet unknown) object. One way to accomplish this is to use an object tracker, such as a motion based or feature based object tracker. There are several kinds of motion-based and other object tracking techniques that are well known to those having ordinary skill in the art. Second, determining which image crop out of the whole object track to select for re-identification purposes poses another problem. There are a variety of problems that could occur with individual images in an object track. For example, some of the images might be of too low quality to be useful, for example, due to motion blur that occurs in an image frame; angular or lighting changes; when an object or person enters the camera view, not all of the person or object is yet visible for the first several frames in the object track, etc.
Yet further, it is impossible to know how many frames to wait before selecting an image crop, because it is not possible to predict what the object or person will do. May be the object will remain at the border of the camera view for a long time, or may be the object will pass through the camera view very quickly. Therefore, merely assigning a fixed delay before selecting the image crop is not a solution that would work very well. For at least these reasons, it is clear that there is a need for better techniques for deciding which image crop to select from an object track for identification or re-identification of objects or people.