(1) Field of Invention
The present invention relates to searching visual imagery and, more specifically to a system for intelligent goal-directed search in large volume visual imagery using a cognitive-neural subsystem.
(2) Description of Related Art
The present invention relates to video image analysis. Previous methods of video image analysis fall into two categories: human vision and computer vision. Human vision methods locate regions of interest by systematically or randomly manually scanning the fovea (narrow fields of view) over the image area. In the case of a magnifying optical system such as binoculars, the human manually scans the optical system over a wider field of view (FOV). For an expert in surveillance and reconnaissance this can take over 5 minutes for a 120 degree FOV region. Furthermore, this process is limited in range by the effective optical magnification of the system. Humans are also more likely to make errors during prolonged or difficult tasks due to fatigue.
Recently the field has seen the emergence of neural or “brain-in-the-loop” image analysis methods which analyze static, previously-acquired imagery using electroencephalography (EEG). These neural methods are limited to sequential presentation of pre-selected image chips followed by manual inspection, and are also limited by human fatigue during long presentation sessions.
Computer vision methods, on the other hand, have been developed to automatically detect objects of interest based on large numbers of sample training data. These computer vision methods are prone to error and typically useful only in previously known conditions and for previously determined small numbers of objects of interest.
A small number of previous methods have used human neural methods that are based on the Rapid Serial Visual Processing (RSVP) paradigm. RSVP is often referred to as an oddball task since the brain elicits a particular response to a novel stimulus. This response is called the P300 response and occurs 300 milliseconds after stimulus. In an RSVP method, images of a specified target object of interest and background distractor images not containing the target are both captured and stored. The target objects typically must comprise no more than 25% of the image area, be large enough to be visible, be placed near the center of fixation, and be interspersed at a pre-specified rate. Being interspersed at a pre-specified rate typically requires far fewer target images than distractor images (e.g., 2 per 100, called the target probability), as well as proper interspersing of the target, called the target-to-target interval. RSVP then presents images to the operator at a rate 5 to 10 images per second. Single trial EEG recordings at specific scalp locations are made and processed using simple linear pattern classifiers or multiple classifier systems. These classifiers must be trained on large numbers of representative data similar to the data to be analyzed, and over varying time window intervals (e.g., 2500 sample images, with 50 target images and 2450 non-target images). In practical situations, RSVP approaches require the user to focus on flashing images for 20 to 30 seconds or more, depending on how many regions of interest exist in the image sequence. These existing RSVP approaches are not capable of inspecting live imagery, and do not support the use of Cognitive algorithms in the system architecture.
Thus, a continuing need exists for a cognitive-neural system for large volume image analysis which incorporates both human and computer components and is capable of analyzing live and stored imagery.