Prior art devices described in the relevant patent literature for capturing one or more objects in a scene typically include a camera device of known location or trajectory, a scene including one or more calibrated target objects, and at least one object of interest (see U.S. Pat. No. 5,699,444 to Sythonics Incorporated). Most prior art devices are used for capture of video data regarding an object operate in a controlled setting, oftentimes in studios or sound stages, and are articulated along a known or pre-selected path (circular or linear). Thus, the information recorded by the device can be more easily interpreted and displayed given the strong correlation between the perspective of the camera and the known objects in the scene.
To capture data regarding objects present in a scene a number of techniques have been successfully practiced. For example, U.S. Pat. No. 5,633,944 entitled “Method and Apparatus for Automatic Optical Recognition of Road Signs” issued May 27, 1997 to Guibert et al. and assigned to Automobiles Peugeot, discloses a system wherein a laser beam, or other source of coherent radiation, is used to scan the roadside in an attempt to recognize the presence of signs.
Additionally, U.S. Pat. No. 5,790,691 entitled “Method and Apparatus for Robust Shape Detection Using a Hit/Miss Transform” issued Aug. 4, 1998 to Narayanswarny et al. and assigned to the Regents of the University of Colorado (Boulder, Colo.), discloses a system for detecting abnormal cells in a cervical Pap-smear. In this system, a detection unit inspects a region of interest present in two-dimensional input images and morphologically detects structure elements preset by a system user. By further including a thresholding feature, the shapes and/or features recorded in the input images can deviate from structuring elements and still be detected as a region of interest. This reference clearly uses extremely controlled conditions, known presence of objects of interest, and continually fine-tuned filtering techniques to achieve reasonable performance. Similarly, U.S. Pat. No. 5,627,915 entitled “Pattern Recognition System Employing Unlike Templates to Detect Objects Having Distinctive Features in a Video Field,” issued May 6, 1997 to Rosser et al. and assigned to Princeton Video Image, Inc. of Princeton, N.J., discloses a method for rapidly and efficiently identifying landmarks and objects using a plurality of templates that are sequentially created and inserted into live video fields and compared to a prior template(s) in order to successively identify possible distinctive feature candidates of a live video scene and also eliminate falsely identified features. The process disclosed by Rosser et al. is repeated in order to preliminarily identify two or three landmarks of the target object the locations of these “landmarks” of the target object and finally said landmarks are compared to a geometric model to further verify if the object has been correctly identified by process of elimination. The methodology lends itself to laboratory verification against pre-recorded videotape to ascertain accuracy before applying said system to actual targeting of said live objects. This system also requires specific templates of real world features and does not operate on unknown video data with its inherent variability of lighting, scene composition, weather effects, and placement variation from said templates to actual conditions in the field.
Further prior art includes U.S. Pat. No. 5,465,308 entitled “Pattern Recognition System” issued Nov. 7, 1995 to Hutcheson et al. and assigned to Datron/Transoc, Inc. of Simi Valley, Calif., discloses a method and apparatus under software control that uses a neural network to recognize two dimensional input images which are sufficiently similar to a database of previously stored two dimensional images. The images are processed and subjected to a Fourier transform (which yields a power spectrum and then a in-class/out-of-class sort is performed). A feature vector consisting of the most discriminatory magnitude information from the power spectrum is then created and are input to a neural network preferably having two hidden layers, input dimensionality of elements of the feature vector and output dimensionality of the number of data elements stored in the database. Unique identifier numbers are preferably stored along with the feature vector. Applying a query feature vector to the neural network results in an output vector which is subjected to statistical analysis to determine whether a threshold level of confidence exists before indicating successful identification has occurred. Where a successful identification has occurred a unique identifier number for the identified object may be displayed to the end user to indicate. However, Fourier transforms are subject to large variations in frequency such as those brought on by shading, or other temporary or partial obscuring of objects, from things like leaves and branches from nearby trees, scratches, bullet holes (especially if used for recognizing road signs), commercial signage, windshields, and other reflecting surfaces (e.g., windows) all have very similar characteristics to road signs in the frequency domain.
In summary, the inventors have found that in the prior art related to the problem of accurately identifying and classifying objects appearing in a videodata most all efforts utilize complex processing, illuminated scenes, continual tuning of a single filter and/or systematic comparison of aspects of an unknown object with a variety of shapes stored in memory. The inventors propose a system that efficiently and accurately retrieves and catalogs information distilled from vast amounts of video data so that object classification type(s), locations, and bitmaps depicting the actual condition of the objects (when originally recorded) are available to an operator for review, comparison, or further processing to reveal even more detail about each object and relationships among objects.
The present invention thus finds utility over this variety of prior art methods and devices and solves a long-standing need in the art for a simple apparatus for quickly and accurately recognizing, classifying, and locating each of a variety of objects of interest appearing in a video stream. Determining that an object is the “same” object from a distinct image frame.
The present invention addresses an urgent need for virtually automatic processing of vast amounts of video data—that possibly depict one or more desired objects—and then precisely recognize, accurately locate, extract desired characteristics and, optionally, archive bitmap images of each said recognized object. Processing such video information via computer is preferred over all other forms of data interrogation, and the inventors suggest that such processing can accurately and efficiently complete a task such as identifying and cataloguing huge numbers of objects of interest to many public works departments and utilities; namely, traffic signs, traffic lights, man holes, power poles and the like disposed in urban, suburban, residential, and commercial settings among various types of natural terrain and changing lighting conditions (i.e., the sun).