(1) Technical Field
The present invention relates to an object detection system and, more particularly, to a system, method, and computer program product for detecting objects-of-interest in stored or live dynamic imagery by fusing cognitive algorithms with a human analyst to improve the accuracy of detection.
(2) Description of Related Art
Most imagery is visually analyzed by humans to search for objects-of-interest (e.g., targets and suspicious activity in videos from drones, satellite imagery, etc.). Such manual video analysis is slow and prone to human error, such as missing potential objects-of-interest. A large volume of imagery is also never reviewed because of human analyst resource shortage. To overcome these limitations, there has been a surge of interest in developing and using automated computer algorithms and software to aid or/and emulate human visual perception in imagery analysis. By way of example, Huber et al. previously described various cognitive algorithms for rapid threat search and detection (see the List of Cited Literature References, Literature Reference Nos. 1 and 2). However, while these algorithms help and perform reasonably well, they are still limited in what they can detect because it is difficult to model human search behavior.
It is well known that humans employ a combination of bottom-up and top-down cues when searching for objects-of-interest. Most work in the area of cognitive algorithms has still been focused on modeling bottom-up attention (see Literature Reference Nos. 1 through 3). There is some limited work on modeling top-down attention, i.e., capturing human top-down biases and knowledge to build algorithms that predict or emulate where humans would look in imagery (see Literature Reference Nos. 4-8). However, these methods are usually ad hoc and do not perform well. Such top-down methods also use either knowledge of prior imagery from fixed cameras (spatial context) or look for known objects with training on several examples of a same object (object context). In the latter case, a system can then find these known objects and not be required to have the sensitivity to find new objects-of-interest. As a result existing methods do not typically have applicability to real-world imagery and the ability to detect new objects-of-interest in the imagery.
Attention models are usually compared against human eye tracking data on the same imagery to determine how good the models are in detecting objects that a human fixates on (see Literature Reference Nos. 3, 7, 9-10). As expected, there is low correlation between human fixation and typical attention models. No model or algorithm can capture the full intent of a human nor will a human be completely replaced by an algorithm.
Thus, a continuing need exists for a system for detecting objects-of-interest in stored or live dynamic imagery by fusing cognitive algorithms and a human analyst to improve the accuracy of detection.