Investigation of incidents is an important consideration in large scale video surveillance systems, such as those used by city authorities or law enforcement to monitor and investigate traffic incidents. These large scale surveillance systems often need to track objects across several different sensors, while understanding the connection between the tracked objects in the different sensors.
Various solutions to the problem of connecting tracked objects in different sensors have been proposed in the prior art. For example:
PCT Patent Publication No. WO2014/072971 describes determining a license plate number of a vehicle tracked by a surveillance system by monitoring a first area with a surveillance camera to detect entry of a vehicle into the first area and recording a detection time, and substantially simultaneously capturing with a LPR camera an image of a license plate of a vehicle entering the first area, and correlating the time of the detection with the time of the capture to associate the tracked vehicle with a license plate number.
U.S. Pat. No. 5,696,503 describes a traffic surveillance system having a plurality of sensor systems separated by a roadway link, each sensor system comprising a fingerprinting sensor and a traffic processor, the sensor providing raw signals including fingerprints of vehicles within a field, and the processor distinguishing individual vehicles based upon their respective fingerprints, reducing the fingerprints to characterizations of predefined attributes, and determining the position of each distinguished vehicle within the field.
U.S. Pat. No. 7,295,106 describes classifying objects in a monitored zone using multiple surveillance devices by receiving a set of objects within a predefined zone area from each of at least a first and second surveillance means. Subsequently, each received set of objects is filtered to ensure that the objects in the set are comparable to the objects in the other received set. Characteristics of the received sets of objects are compared and characteristics of the objects within a received set of objects are compared to characteristics of the objects within a different set of received objects, wherein the characteristics are based upon a set of predetermined characteristics. It is determined if each object or set identified by the first surveillance means corresponds to an object or set identified by the second surveillance means.
U.S. Patent Publication No. 2014/0098221 describes an approach for re-identifying, in a second test image, an object in a first test image by determining a brightness transfer functions (BTFs) between respective pairs of training images. Respective similarity measures are determined between the first test image and each of the training images captured by the first camera (first training images). A weighted brightness transfer function (WBTF) is determined by combining the BTFs weighted by weights of the first training images. The first test image is transformed by the WBTF to better match one of the training images captured by the second camera. Another test image, captured by the second camera, is identified because it is closer in appearance to the transformed test image than other test images captured by the second camera.
EP Patent No. 1,489,552 describes improved detection and recognition of objects such as vehicles by image processing a camera image of the vehicle by correcting sensor pixel values with a reflection component. The detected vehicle can be re-identified downstream by a second camera. The success rate of recognition can be improved by recognizing additional object consequences and/or platoons.
PCT Patent Publication No. WO2011/120194 describes measuring a journey time between nodes in a road network by detecting characteristics of a car sequence sequentially passing through the node network, wherein a first node reports characteristics of the car sequence to a neighbor node, and the neighbor node compares the characteristics of the car sequence reported by the first node with characteristics of the car sequence detected at the neighbor node to find a matching position, and calculates a journey time from the first node to the neighbor node.
Coifman, Benjamin (1999), “Vehicle Reidentification and Travel Measurements on Congested Freeways”, California Partners for Advanced Transit and Highways (PATH), describes using loop detectors at an upstream and downstream location to measure vehicle lengths, and comparing vehicle platoons detected at the upstream and downstream locations to identify matching platoons based on the vehicle lengths of vehicles in the platoons, thereby enabling identification of a particular vehicle of particular length within the reidentified platoon.
C. C. Sun, R. P. Ramachandran and S. G. Ritchie “Vehicle reidentification using multidetector fusion”, IEEE Trans. Intell. Transp. Syst., vol. 5, no. 3, pp. 155-164 2004, describes a multi-detector vehicle re-identification algorithm by selecting a platoon detected at a downstream site, generating a list of upstream candidate platoons subject to a time window constraint, and comparing each upstream platoon to the downstream platoon using feature vectors and a linear L1 (absolute distance) nearest neighbor classifier to determine a best matching platoon, whereupon individual vehicles are then reidentified.
The references cited below teach background information that may be applicable to the presently disclosed subject matter:    dos Santos, D. J. A. (2008). Automatic Vehicle Recognition System. Masters Dissertation, Universidade Tecnica de Lisboa. (“Santos”)    Jang, D. M., & Turk, M. (2011). Car-Rec: A real time car recognition system. In applications of computer vision (WACV), 2011 IEEE Workshop on (pp. 599-605). IEEE. (“Jang”)    Lim, T. R., & Guntoro, A. T. (2002). Car recognition using Gabor filter feature extraction. In Circuits and Systems, 2002. APCCAS'02. 2002 Asia-Pacific Conference on (Vol. 2, pp. 451-455). IEEE. (“Lim”)    Dorkó, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on (pp. 634-639). IEEE. (“Schmid”)    Swain, M. J., & Ballard, D. H. (1991). Color indexing. International journal of computer vision, 7(1), 11-32. (“Swain”)    Caspi, Y., Simakov, D., & Irani, M. (2006). Feature-based sequence-to-sequence matching. International Journal of Computer Vision, 68(1), 53-64.    Barton, G. J. (1998). Protein sequence alignment techniques. Acta Crystallo-graphica Section D: Biological Crystallography, 54(6), 1139-1146.    Navarro, G. (2001). A guided tour to approximate string matching. ACM computing surveys (CSUR), 33(1), 31-88. http://doi.acm.org/10.1145/375360.375365
The full contents of the above publications are incorporated by reference herein in their entirety.
General Description
Two of the most common sensors for monitoring and tracking vehicles are license plate recognition (LPR) camera systems and standard video cameras. However, there presents a technical problem in mapping images between LPR systems and video cameras because LPR images and video images of the same scene can display very different visual properties. LPR cameras usually rely on infra-red light to capture video or still images, while “regular” video cameras are designed to focus on the visible spectrum. Therefore, a video image and an LPR image capturing the same object at the same time can visually represent the object differently.
Moreover, additional camera settings such as the image resolution, viewing angle or light exposure, present further challenges when establishing scene correspondences between such two cameras. Finding a correspondence between an object viewed from two different cameras based solely on its available visual representations presents a challenge for automated systems, such as object tracking systems.
In accordance with certain aspects of the presently disclosed subject matter, there is provided a method of identifying in one or more images captured by a second camera a target object captured by a first camera. The method is provided by a processor, and comprises storing in a memory operatively coupled to the processor a first set of trained classifiers and a second set of trained classifiers. The first set specifies values corresponding to a first plurality of attributes usable for identifying objects captured by the first camera, and the second set differs from the first set and specifies values corresponding to a second plurality of attributes usable for identifying objects captured by the 2nd camera. The first set of trained classifiers and second set of trained classifiers are trained independently, and the first plurality of attributes and second plurality of attributes have at least one attribute in common. The method further comprises using one or more images captured by the first camera for generating a reference platoon of n objects, the reference platoon comprising the target object and (n−1) other objects. The method further comprises generating a reference group by running the first set of trained classifiers over the reference platoon, the reference group being indicative of values of attributes specified by the first set of trained classifiers and characterizing the objects in the reference platoon. The method further comprises using one or more images captured by the second camera for generating a plurality of candidate platoons, each candidate platoon comprising n objects, wherein the one or more images are captured by the second camera in a time window corresponding to the time of capturing by the first camera the one or more images used for generating the reference platoon. The method further comprises generating a plurality of candidate groups, each candidate group obtained by running the second set of trained classifiers over a respective candidate platoon, each candidate group being indicative of values of attributes specified by the second set of trained classifiers and characterizing the objects in the corresponding candidate platoon. The method further comprises selecting a candidate platoon corresponding to a candidate group best matching the reference group, and identifying the target object in the selected candidate platoon in accordance with a position of the target object in the reference platoon.
In accordance with certain other aspects of the presently disclosed subject matter, there is provided a method of identifying in one or more images captured by a second camera a target platoon of n objects corresponding to a reference platoon of n objects generated using images captured by a first camera, the method provided by a processor and comprising storing in a memory operatively coupled to the processor a first set of trained classifiers and a second set of trained classifiers. The first set specifies values corresponding to a first plurality of attributes usable for identifying objects captured by the first camera, and the second set differs from the first set and specifies values corresponding to a second plurality of attributes usable for identifying objects captured by the second camera. The first set of trained classifiers and second set of trained classifiers are trained independently, and the first plurality of attributes and second plurality of attributes have at least one attribute in common. The method further comprises generating a reference group by running the first set of trained classifiers over the reference platoon, the reference group being indicative of values of attributes specified by the first set of trained classifiers and characterizing the objects in the reference platoon. The method further comprises using one or more images captured by the second camera for generating a plurality of candidate platoons, each candidate platoon comprising n objects, wherein the one or more images are captured by the second camera in a time window corresponding to the time of capturing by the first camera the one or more images used for generating the reference platoon. The method further comprises generating a plurality of candidate groups, each candidate group obtained by running the second set of trained classifiers over a respective candidate platoon, each candidate group being indicative of values of attributes specified by the second set of trained classifiers and characterizing the objects in the corresponding candidate platoon. The method further comprises selecting a candidate platoon corresponding to a candidate group best matching the reference group, and identifying the selected candidate platoon as the target platoon.
In accordance with further aspects of the presently disclosed subject matter, the mth object in the target platoon can be identified as the same object as the mth object in the reference platoon, m being less than or equal to n.
In accordance with certain other aspects of the presently disclosed subject matter, there is provided a system for identifying an object in a group of objects appearing in a plurality of cameras comprising a first camera, a second camera, a memory, and a processing unit communicatively coupled to the first camera, the second camera, and the memory. The processing unit comprises a processor configured to store in the memory a first set of trained classifiers and a second set of trained classifiers. The first set specifies values corresponding to a first plurality of attributes usable for identifying objects captured by the first camera, and the second set differs from the first set and specifies values corresponding to a second plurality of attributes usable for identifying objects captured by the second camera. The first set of trained classifiers and second set of trained classifiers are trained independently, and the first plurality of attributes and second plurality of attributes have at least one attribute in common. The processor is further configured to generate, using one or more images captured by the first camera, a reference platoon of n objects, the reference platoon comprising the target object and (n−1) other objects. The processor is further configured to generate a reference group by running the first set of trained classifiers over the reference platoon, the reference group being indicative of values of attributes classified by the first set of trained classifiers and characterizing the objects in the reference platoon. The processor is further configured to generate, using one or more images captured by the second camera, a plurality of candidate platoons, each candidate platoon comprising n objects, wherein the one or more images are captured by the second camera in a time window corresponding to the time of capturing by the first camera the one or more images used for generating the reference platoon. The processor is further configured to generate a plurality of candidate groups, each candidate group obtained by running the second set of trained classifiers over a respective candidate platoon, each candidate group being indicative of values of attributes classified by the second set of trained classifiers and characterizing the objects in the corresponding candidate platoon. The processor is further configured to select a candidate platoon corresponding to a candidate group best matching the reference group, and to identify the target object in the selected candidate platoon in accordance with a position of the target object in the reference platoon.
In accordance with certain other aspects of the presently disclosed subject matter, there is provided a non-transitory storage medium comprising instructions that when executed by a processor, cause the processor to store in the memory a first set of trained classifiers and a second set of trained classifiers. The first set specifies values corresponding to a first plurality of attributes usable for identifying objects captured by the first camera, and the second set differs from the first set and specifies values corresponding to a second plurality of attributes usable for identifying objects captured by the second camera. The first set of trained classifiers and second set of trained classifiers are trained independently, and the first plurality of attributes and second plurality of attributes have at least one attribute in common. The instructions further cause the processor to generate, using one or more images captured by the first camera, a reference platoon of n objects, the reference platoon comprising the target object and (n−1) other objects. The instructions further cause the processor to generate a reference group by running the first set of trained classifiers over the reference platoon, the reference group being indicative of values of attributes classified by the first set of trained classifiers and characterizing the objects in the reference platoon. The instructions further cause the processor to generate, using one or more images captured by the second camera, a plurality of candidate platoons, each candidate platoon comprising n objects, wherein the one or more images are captured by the second camera in a time window corresponding to the time of capturing by the first camera the one or more images used for generating the reference platoon. The instructions further cause the processor to generate a plurality of candidate groups, each candidate group obtained by running the second set of trained classifiers over a respective candidate platoon, each candidate group being indicative of values of attributes classified by the second set of trained classifiers and characterizing the objects in the corresponding candidate platoon. The instructions further cause the processor to select a candidate platoon corresponding to a candidate group best matching the reference group, and to identify the target object in the selected candidate platoon in accordance with a position of the target object in the reference platoon.
In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, at least some of the objects can be vehicles, and at least one attribute can be selected from the group consisting of: attributes related to vehicle size, attributes related to vehicle color, attributes related to vehicle type, attributes related to vehicle shape, and attributes related to vehicle aspect ratio.
In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, each classifier within a set of classifiers can be trained independently. The objects can be vehicles behind and/or in front of the target vehicle in the same lane as the target vehicle, and the value for n can be in the range of 3-9. One of the first or second cameras can be configured for license plate recognition.
In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, the reference group can be a reference sequence, and the candidate groups can be candidate sequences.
In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, each candidate sequence can be associated with a distance metric indicative of the sequence level distance between the given candidate sequence and the reference sequence, and the candidate sequence best matching the reference sequence can be the candidate sequence associated with the lowest distance metric. The sequence level distance can be equal to the sum of object level distances for each pair of corresponding objects in the candidate sequence-reference sequence pair, and an object level distance for a given pair of objects can be the sum of the attribute level distances for the given pair of objects.
In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, each attribute can be associated with a given weight, and the sequence level distance can be equal to the weighted sum of the object level distances, and each object level distance can be equal to the weighted sum of the attribute level distances. The attribute weights can be learned by minimizing the error on object level distances for a given pair of matching sequences.
In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, n can be selected from the group consisting of: a predetermined value determined according to the degree of similarity of the objects in proximity to the target object, a predetermined value determined according to the frequency of passing objects in proximity to the target object, and a configurable value selected by a user.
It should be noted that one benefit of having the two sets of classifiers (one for each type of camera) trained independently of each other is that if a new camera type is added to the system, it is sufficient to train the classifiers for the new camera type. There is no need to change or retrain the classifiers for an existing camera type.