1. Field of the Invention
The invention relates to video monitoring and interpretation, as by software-aided methodology, and more particularly, to a system and method for improving the utility of video images in systems handling video, such as for system-interpreted analysis of video images for security purposes, and for many other purposes.
2. Description of the Known Art
There has been developed a system in accordance with U.S. patent application Ser. No. 09/773,475, filed Feb. 1, 2001, entitled SYSTEM FOR AUTOMATED SCREENING OF SECURITY CAMERAS, which was issued as U.S. Pat. No. 6,940,998, on Sep. 6, 2005, and corresponding International Patent Application PCT/US01/03639, of the same title, filed Feb. 5, 2001, both assigned to the same entity as the owner of the present application, and both herein incorporated by reference. That system, also called a security system, may be identified by the trademark PERCEPTRAK (“Perceptrak” herein). In the Perceptrak security system, video data is picked up by any of many possible video cameras. It is processed by software control of the system before human intervention for an interpretation of types of, images and activities of persons and objects in the images. As the video may be taken by video cameras in any of many possible locations and under conditions subject to variation beyond the control of the system, the captured video can include useless information such as visible “noise” which, upon segmentation of images together with such noise, interferes with usable information or detracts from or degrades video data useful to the system. More specifically, the Perceptrak system provides automatically screening of closed circuit television (CCTV) cameras (“video cameras”) for large and small scale security systems, as used for example in parking garages. The Perceptrak system includes six primary software elements, each of which performs a unique function within the operation of such system to provide intelligent camera selection for operators, resulting in a marked decrease of operator fatigue in a CCTV system. Real-time video analysis of video data is performed, where a single pass of a video frame produces a terrain map, which contains elements that are termed image primitives, which are low level features of the video. Based on the primitives of the terrain map, the system is able to make decisions about which camera an operator should view based on the presence and activity of vehicles and pedestrians and furthermore, discriminates vehicle traffic from pedestrian traffic. The system is compatible with existing CCTV (closed circuit television) systems and is comprised of modular elements to facilitate integration and upgrades.
The Perceptrak system is capable of automatically carrying out decisions about which video camera should be watched, and which to ignore, based on video content of each such camera, as by use of video motion detectors, in combination with other features of the presently inventive electronic subsystem, constituting a processor-controlled selection and control system (“PCS system”), which serves as a key part of the overall security system, for controlling selection of the CCTV cameras. The PCS system is implemented to enable automatic decisions to be made about which camera view should be displayed on a display monitor of the CCTV system, and thus watched by supervisory personnel, and which video camera views are ignored, all based on processor-implemented interpretation of the content of the video available from each of at least a group of video cameras within the CCTV system.
Included in the PCS system are video analysis techniques which allow the system to make decisions about which camera an operator should view based on the presence and activity of vehicles and pedestrians. Events are associated with both vehicles and pedestrians and include, but are not limited to, a single pedestrian, multiple pedestrians, a fast pedestrian, a fallen pedestrian, a lurking pedestrian, an erratic pedestrian, converging pedestrians, a single vehicle, multiple vehicles, fast vehicles, and a vehicle that stops suddenly.
The video analysis techniques of the Perceptrak system can discriminate vehicular traffic from pedestrian traffic by maintaining an adaptive background and segmenting (separating from the background) moving targets. Vehicles are distinguished from pedestrians based on multiple factors, including the characteristic movement of pedestrians compared with vehicles, i.e. pedestrians move their arms and legs when moving and vehicles maintain the same shape when moving. Other factors include the aspect ratio and object smoothness. For example, pedestrians are taller than vehicles and vehicles are smoother than pedestrians.
The analysis is performed on the terrain map primitives, in accordance with the disclosure of said U.S. patent no., to which reference should be had. Generally, Terrain Map is generated from a single pass of a video frame, resulting in characteristic information regarding the content of the video. Terrain Map creates a file with characteristic information.
The informational content of the video generated by Terrain Map is the basis for video analysis techniques of the Perceptrak system and results in the generation of several parameters for further video analysis. These parameters include at least:
(1) Average Altitude; (2) Degree of Slope; (3) Direction of Slope; (4) Horizontal Smoothness; (5) Vertical Smoothness; (6) Jaggyness; (7) Color Degree; and (8) Color Direction.
The PCS system of the Perceptrak system disclosed in said U.S. Pat. No. 6,940,998, comprises a number of primary software-driven system components, as shown therein, including those termed:    (1) Analysis Worker(s);    (2) Video Supervisor(s);    (3) Video Worker(s);    (4) Node Manager(s);    (5) PsAdministrator (formerly called Set Rules GUI (Graphical User Interface); and    (6) Arbitrator.
In the Perceptrak system, as here described generally, video input from security cameras is first sent to a Video Worker, which captures frames of video (frame grabber) and has various properties, methods, and events that facilitate communication with the Video Supervisor. There is one Video Supervisor for each frame grabber. The Analysis Workers perform video analysis on the video frames captured by the Video Worker and subsequently report activity to the Video Supervisor. Similarly, the Analysis Workers have various properties, methods, and events that facilitate communication with the Video Supervisor. The Video Supervisor keeps track of when frames are available from the Video Worker and when the Analysis Worker is prepared for another frame, and directs data flow accordingly. The Video Supervisor then sends data to the Node Manager, which in turn concentrates the communications from multiple Video Supervisors to the Arbitrator, thereby managing and decreasing the overall data flow to the Arbitrator.
The general term “software” is herein used and intended simply for convenience to mean programs, program instructions, code or pseudo code, process or instruction sets, source code and/or object code processing hardware, firmware, drivers and/or utilities, and/or other digital processing devices and means, as well as software per se.
Adaptive Segmentation Gain
Area-specific adaptive segmentation threshold is employed in areas of video to be segmented, in accordance with the invention. It is herein preferred to use the alternative terminology “adaptive segmentation gain”. Such adaptive segmentation gain is used to advantage in the Perceptrak system (sometimes hereinafter simply referred to as “the system”, as described in said patent application), and said system is here representative of possible systems which could employ the present invention for the present or comparable purposes). There, as just one exemplary technique which can be used, a PID control loop can be used at each Analysis Worker, and such a “PI loop” attempts to maintain a fixed amount of segmentation noise. Heretofore, in the Perceptrak system segmentation noise was measured screen wide for each video frame. An average of many frames was used to make segmentation gain changes. In scenes where there is an excessive amount of motion in a part of the screen, the prior approach caused the overall segmentation gain to be reduced screen wide in order to maintain the fixed amount of total noise in the scene. Sometimes the noise was seen only in a small part of the scene that had continual noise.
A difficulty has been realized in that segmentation gain can be very low in “quiet” (relatively noise-free) areas of the scene, but noise elsewhere in the image may have the result that subjects (such as people) within the image field are only partially segmented.
For example, in a scene to be captured by video, a tree (or other vegetation) may exist. Light streams through or is reflected off the foliage of the tree or vegetation in an indefinite pattern, which may shift upon limb or leaf motion resulting from air movement. Such shifting or sporadic light produces small areas of relative difference in light intensity, recognizable as, or considered to be, small bits of noise, over a period of time. As a result, the moving limbs of vegetation are segmented because they are different from the background. In simplest terms, there may be segmented “blobs” which are segmented because they contain noise, which now show in the illustrations as, e.g., rectangles, in the segmented portions of image. White spots in such a representative rectangle example signify “noise blobs” resulting from the segmentation, and the term “noise blobs” is used to connote herein the segmented areas resulting from noise, and thus noise blobs are tangible image artifacts of noise captured by segmentation of subjects in scenes. For example, FIG. 3 shows noise blobs appearing as white spots.
According to an example mode of operation, an area of 5 pixels by 5 pixels may be used as the cut-off, or minimum size, of a blob to be segmented.
The noise blobs can be counted. The count usefully indicates the extent to which noise blobs are being segmented and the count used to control the threshold (“gain”) of segmentation.
Heretofore, to get useful data, given the noise within a video frame (where video is captured frame by frame), it has been a practice to measure noise over an average of many frames. In the Perceptrak system, for example, the average can then be used to make overall segmentation gain changes. In other words, segmentation intensity levels (gain) can be continuously controlled over a group of 100 (for example) frames at a time. A difficulty exists in that changes in gain are controlled not only in response to objects in the captured video, which objects it is desired that the system segment, but also in response to light “noise.” Such visual noise degrades segmented images.
The term “PID” has been used to refer to a protocol typically employed for control loops, being a proportional integral derivative control algorithm often used in industrial and process control, as in single loop controllers, distributed control systems (DCS) and programmable logic controllers (PLC's) or programmable logic devices (PLD's). A PID control algorithm may comprise three elements, where the acronym PID derives from these terms:
Proportional—also known as gain
Integral—also known as automatic reset or simply reset
Derivative—also known as rate or pre-act
Such PID algorithm control may be employed in the segmentation of images in video processing.
Segmentation gain is controllable according to visual “noise”, as the term is used herein to connote image-degrading light or light changes (such as that produced by light shining through or reflected by foliage such as that of trees) or, as stated otherwise, to connote image-degrading light or light changes or extraneous or spurious light sources which degrade the capability of the system of using video to “segment” people or other preferred subjects in the scene, that is to discriminate or separate such subjects in the scene. The noise thus interferes with getting segmented images of subject people within the scene. Activities of subject people in a captured video scene (e.g., as running, walking, loitering, aggregating in groups, or falling down) are desired to be monitored by the Perceptrak system for security purposes. So also, as a further example without limiting the possibilities of subjects which can be segmented for the present purposes, it may be desired that activities of moving vehicles be monitored.
Heretofore, approaches have been taken to determine motion of subjects within video fields, typically by pixel analysis. For example, it has been proposed to detect motion from differences between recent scenes using a so-called reference frame, or by background statistics, or by taking into consideration adjacent frames. In such technological approaches, only motion detection is typically possible. In such a prior technological approach, if a subject person stops moving in the scene of any motion detection, the person will no longer be segmented. The technique of object segmentation in accordance with the present application makes use of an adaptive background of some sort. With such an adaptive background, an object can remain motionless for an indefinite time and yet can be segmented.
Approaches of the prior art typically have made use of a fixed segmentation threshold for an entire scene. The above-identified Perceptrak system disclosed varying segmentation gain for an entire scene based on segmentation noise in the scene. This is referred to as adaptive segmentation gains when applied to an entire scene. It has been elsewhere proposed to adjust segmentation gain for different groups of pixels but such adjustment has heretofore been pixel-based and so also based upon intensity only.
Such problems and video analysis limitations are intended to be overcome by the presently inventive system approach and methodology.