This invention applies to the field of image understanding as opposed to the field of image processing.
As used herein the term “image processing” applies to computer operations that have pixels as both input and output. Examples include smoothing, threshold, dilation, and erosion. In each of those image processing operations, a buffer of pixels is the input to the operation and a buffer of pixels is the output of the operation. Each pixel in the input and output contains only brightness information.
The term “image understanding” as used herein applies to computer operations where the pixels in the image buffers are grouped into higher level constructs and described with symbolic data. Subsequent image understanding operations can be performed on the symbolic data without referring to the original pixels. This invention anticipates that there will be multiple levels of abstraction between the lowest (pixel) level and the ultimate understanding of objects in a context. An initial step in creating higher levels of abstraction for image understanding was the invention of what is termed the Terrain Map, an element discussed below.
There has been developed a system of the present inventor in accordance with copending patent application owned by the present applicant's assignee/intended assignee, namely application Ser. No. 09/773,475, filed Feb. 1, 2001, published as Pub. No.: US 2001/0033330 A1, Pub. Date: Oct. 25, 2001, entitled System for Automated Screening of Security Cameras, also called a security system, and corresponding International Patent Application PCT/US01/03639, of the same title, filed Feb. 5, 2001, both hereinafter referred to the Perceptrak disclosure or system, and herein incorporated by reference. That system may be identified by the mark PERCEPTRAK (“Perceptrak” herein), which is a registered trademark (Regis. No. 2,863,225) of Cernium, Inc., applicant's assignee/intended assignee.
In the Perceptrak disclosure, video data is picked up by any of many possible video cameras. It is processed by software control of the system before human intervention for an interpretation of types of images and activities of persons and objects in the images. It disclosed the concept of an element called Terrain Map as an image format for machine vision. In that original implementation, the Terrain Map element has one Terrain Map element for each four pixels of the original image with each Terrain Map member in turn having eight members or primitives describing a 4×4 pixel neighborhood adjacent to the four pixels per map element.
In the Perceptrak system real-time image analysis of video data is performed wherein at least a single pass of a video frame produces a Terrain Map which contains parameters (primitives or members) indicating the content of the video. Based on the parameters of the Terrain Map, the Perceptrak system is able to make decisions and derive useful information about image, such as discriminating vehicles from pedestrians and vehicle traffic from pedestrian traffic.
Terrain Map Derivation
Starting with the recognition that all existing raster diagrams are brightness maps arranged for efficient display for human perception, the Terrain Map was designed to provide additional symbolic data for subsequent analysis steps. Using the analogy of geographic maps, the concept of a Terrain Map was proposed as a means of providing additional data about an image.
In such Terrain Map each of the map member contains symbolic information describing the conditions of that part of the image somewhat analogous to the way a geographic map represents the lay of the land. The Terrain Map members are:                AverageAltitude is an analog of altitude contour lines on a Terrain Map. Or when used in the color space, the analog for how much light is falling on the surface.        DegreeOfSlope is an analog of the distance between contour lines on a Terrain Map. (Steeper slopes have contour lines closer together.)        DirectionOfSlope is an analog of the direction of contour lines on a map such as a south-facing slope.        HorizontalSmoothness is an analog of the smoothness of terrain traveling North or South.        VerticalSmoothness is an analog of the smoothness of terrain when traveling East or West.        Jaggyness is an analog of motion detection in the retina or motion blur. The faster objects are moving the higher the Jaggyness score will be.        DegreeOfColor is the analog of how much color there is in the scene where both black and white are considered as no color. Primary colors are full color.        DirectionOfColor is the analog of the hue of a color independent of how much light is falling on it. For example a red shirt is the same red in full sun or shade.The three members used for the color space, AverageAltitude, DegreeOfColor, and DirectionOfColor represent only the pixels of the element while the other members represent the conditions in the neighborhood of the element. In the current implementation, one Terrain Map element represents four pixels in the original raster diagram and a neighborhood of a map element consists of an 8×8 matrix surrounding the four pixels. The same concept can be applied with other ratios of pixel to map element and other neighborhood sizes.        
FIG. 1 illustrates the Terrain Map structure and depicts graphically the creation of the structure of the Terrain Map such that the Terrain Map provides eight parameters (primitive data) about the neighborhood of pixels in an image buffer. The Terrain Map allows symbolic comparison of different buffers based on the eight parameters, i.e., terrain data members, without additional computer passes through the pixels.
Accordingly there is realized in the Perceptrak disclosure a computer system for automated screening of video cameras, such as security cameras, said computer system in communication with a plurality of video cameras and comprising real-time image analysis components wherein video image data from said video cameras is analyzed by said image analysis components and said video image data is then selectively presented to an operator for security monitoring, said system providing real-time analysis of said video image data for subject content and including:
(a) provision for performing at least one pass through a frame of said video image data; and
(b) provision for generating a Terrain Map from said pass through said frame of said video image data, said Terrain Map comprising a plurality of parameters wherein said parameters indicate the content of said video image data;
said Terrain Map containing in said plurality of parameters characteristic information regarding the content of the video, the characteristic information being based on each of kernels of pixels in an input buffer, the characteristic information comprising at least a number of bytes of data describing the relationship of each of a plurality of pixels in a larger kernel surrounding the first-said kernel.
Other aspects of the Perceptrak disclosure are important and should be understood preliminary to a more complete understanding of the present invention.