The following relates to the image labeling arts, camera-based object labeling arts, and to applications of same such as vehicle labeling and so forth.
Camera-based vehicle labeling (or classification) using a still camera or video camera has diverse applications, such as in: automated or semi-automated toll assessment for toll roads, bridges, parking, or so forth (where, for example, the toll may depend on the number of wheel axles, or the vehicle type, e.g. trucks may pay a higher toll than cars); automated monitoring of a parking facility (e.g., detecting whether or not a vehicle is in a parking spot—this actually labels the parking spot, rather than the vehicle); camera based enforcement of speed limits or other traffic regulations (where the vehicle is labeled as to its speed, or as to whether it has run a red light); monitoring of carpool lanes (where the vehicle is labeled by number of occupants); roadway usage studies (where vehicles may be classified as to their state or country of registration based on their license plates); and so forth. Depending upon the type of vehicle labeling to be performed, the vehicle image that is used for the automated vehicle labeling may be an image of the entire vehicle, or an image of a portion of the vehicle, such as the rear license plate.
In a common installation approach, the camera is mounted so as to have a suitable view of the toll booth entrance, roadway, parking lot entrance, or other location to be monitored, and a set of training vehicle images are acquired. A human installer manually labels each training image as to the vehicle type. These labeled vehicle images form a labeled training set for the camera installation, which are then used to train a vehicle classifier. The training process typically entails optional pre-processing of the image (for example, in the case of license plate labeling, the pre-processing may include identifying the video frame that optimally shows the rear license plate and then segmenting the frame image to isolate the license plate), generating a quantitative representation, e.g. feature vector, representing (optionally pre-processed) image, and training the classifier to assign labels to the feature vector representations that optimally match the manually assigned labels. Thereafter, during the labeling phase, when the camera acquires an image of a vehicle it is analogously pre-processed and converted to a feature vector which is then run through the trained classifier to label the vehicle.
In a typical application, cameras are placed at various strategic locations: for example, at various toll booths, and each camera is independently trained and thereafter used to label vehicles at (or passing through) the location.
However, independently training each camera fails to leverage information that may have been collected from earlier-installed cameras. Independently training each camera can also lead to significant camera-to-camera variations in vehicle labeling performance. Still further, training each camera independently may fail to leverage prior information that may be available on the label distribution, for example from statistics generated by other similar camera installations, prior roadway studies, or from other sources.
While some sort of combined training of multiple camera installations thus appears to be of value, there are impediments to this approach. Vehicle images acquired by different cameras may differ significantly from one another due to differences in the pose, camera angle, camera resolution or other camera characteristics, amount of motion blur, scene illumination, background clutter, or so forth. Vehicle statistics may also differ from one location to another—for example, a toll booth close to a residential area may observe mostly passenger cars, whereas a rural toll booth near a freeway interchange may observe a higher fraction of commercial trucks.
While the illustrative embodiments disclosed herein are directed to camera-based vehicle labeling tasks, it will be appreciated that similar difficulties arise in other camera-based object labeling tasks in which multiple still or video cameras are used to acquire images of objects to be labeled (or in which the same camera is used to acquire the images over different time intervals and/or at different locations). For example, in a retail or advertising setting it may be useful to employ camera-based customer labeling as to gender, age, or so forth in order to provide targeted advertising. In this case the objects to be labeled are human beings. In an event attendance monitoring system images may be labeled as to the number of people shown in the image. Objects may also be animals, or inanimate objects such as the illustrative vehicles. As further examples of camera-based labeling of inanimate objects of interest, in a retail assembly line articles of manufacture may be labeled as to the presence or absence of a certain defect based on an imaging technology that is capable of observing the defect. In the case of a security scanner, the camera may be an x-ray imager or other specialized imaging device, and the object labeling seeks to identify inanimate objects of concern such as firearms, knives, fluids, or so forth. These are again merely illustrative examples.
Furthermore, the classifier training systems disclosed herein may be employed in classifying images for purposes other than labeling of the object shown in the image.
Disclosed in the following are improved image and object labeling techniques, with illustrative description of vehicle labeling tasks.