The present invention relates to utilizing computer vision applications for the automated searching of human image data for people as a function of visual appearance characteristics.
Video, still camera and other image data feeds may be searched to find targeted objects or individuals. For example, to search for a person, one may provide description information indicating certain personal facial visual traits to a manager of a video archive (for example, wearing glasses, baseball hat, etc.), wherein the archive may be manually scanned looking for one or more people with similar characteristics. Such a manual search is both time and human resource consuming. Moreover, human visual attention may be ineffective, particularly for large volumes of image data. Due to many factors, illustratively including an infrequency of activities of interest, a fundamental tedium associated with the task and poor reliability in object tracking in environments with visual clutter and other distractions, human analysis of input information may be both expensive and ineffective.
Automated input systems and methods are known wherein computers or other programmable devices directly analyze video data and attempt to recognize objects, people, events or activities of concern through computer vision applications. Some existing approaches learn a separate appearance model for each of a plurality of image attributes, for example for bald, mustache, beard, hat, sunglasses, light skin-tones, etc. When given a multi-attribute query, such systems may add up the confidence scores for each individual query attribute. Thus, a search for a (i) male (ii) wearing glasses and (iii) a beard may retrieve a plurality of results that each have a confidence score meeting all of three of the attributes, or that each meet one or more. However, the former technique may miss results, for example where one of the attributes is indistinct in a given image resulting in its exclusion. The latter may return too many results, including impossibilities or improbabilities as to meeting all three, such as an image of a person wearing glasses that is a young girl). Thus, the returned results may miss a target, or return too many hits to be analyzed efficiently.