1. Field of the Invention
The present disclosure concerns a processing system for efficiently performing video analytics operations. More specifically, this disclosure describes a system, computer program product, and associated methodology for gathering individual image pixels, selected for video analytics processing, and arranging the gathered pixels in a single pixel matrix on which Single Instructions Multiple Data (SIMD) operations are performed. This significantly reduces processing demands placed on the Central Processing Unit (CPU) processing the image.
The present disclosure also described a network camera that performs video analytics, such as motion detection, and reduces the amount of video data transferred over a network.
2. Discussion of the Background
Video analytics or video content analysis range from video motion detection and audio detection, to more advanced systems including camera tampering detection, people counting, detection of objects crossing lines or areas of interest, vehicle license plate recognition, segmentation of video into foreground and background portions, tracking objects, traffic analysis, meta data extraction, biometric applications, and facial recognition. Video analytics also makes surveillance systems more intelligent to reduce vast amounts of image data to manageable levels. Intelligent video surveillance systems can for example automatically analyze and tag surveillance video in real-time, detect suspicious activities, initiate video recording, activate alarms or other actions to alert operators or other personnel.
In surveillance applications, video analytics is often used to detect motion. Motion detection is a way of defining activity in a scene by analyzing image data, and may be performed on a surveillance camera's entire field of view or on a user-defined area of interest. Furthermore, a video surveillance system with motion detection capabilities is able to detect motion more reliably than a human operator, and is therefore able to free human operators from staring at multiple video monitors for long hours. Instead, the video surveillance system with motion detection capabilities is able to alert the operator using a visual indicator, an audio indicator or both when motion is detected. Such a surveillance system may also automatically focus a surveillance camera on the area where motion was detected to obtain a more detailed image.
As recognized by the present inventor, a network camera capable of performing video analytics would reduce the work load of a centralized image processing system, and conserve valuable network bandwidth. Such a network camera would allow true event-driven surveillance systems where detection of motion by the camera could trigger predefined automatic processes, such as adjusting temperature, activating alarms, locking/unlocking doors, etc.
However, because video analytics frequently entails performing several relatively simple operations on large amounts of pixel data, current methods do not lend themselves to mobile or embedded applications, such as a network camera. Conventional methods of quickly reducing the amount of data (number of pixels) processed during video analytics have been developed to reduce CPU processing burdens.
One such conventional method sequentially steps through all of the pixels in an image to identify pixels that are of interest. If the pixel is not of interest, the method moves to the next pixel for analysis. In this context, “of interest” signifies that the pixel contains information relevant to the analysis being conducted, for example motion information. If the pixel is of interest, the method performs the relevant operations on the pixel before moving on to the next pixel. Thus, this method nests the video analytics processing of a pixel within the routine that identifies pixels of interest. In the case of a filter, for example, while the filter is selectively applied only to pixels of interest, the filter is still applied to only one pixel of interest at a time.
Many modern processors are capable of performing Single Instruction, Multiple Data (SIMD) instructions in order to process multiple data fields in parallel, and increase performance. In processors with SIMD instruction capability, each register is divided into at least two fields. Each field represents data that is independent of data in other fields. For example, in a video analytics context, each field may represent an individual pixel. As the processor is able to execute a SIMD instruction on an entire register, the pixels contained in the fields of the register are processed simultaneously. Thus, performance of a SIMD-capable processor may be significantly better than the performance of a general-purpose processor.
However, as recognized by the present inventor, the above-described conventional method of selecting pixels to be processed is not well suited for SIMD instruction processing. Because the method nests the video analytics processing within the pixel selection routine, a SIMD-capable processor is forced to process selected pixels one at a time in much the same way as a general-purpose processor, thereby negating the advantages gained by employing SIMD instructions.
A result of the above-described inefficiencies of conventional video analytics is that powerful computer systems having high processing capacities are still preferred for performing video analytics functions, such as filtering and motion detection. Therefore, these methods are not well suited for local implementation of video analytics in network cameras.
A typical video surveillance system includes multiple video surveillance cameras connected to a central processing unit by a network, such as an IP-based network. Often the IP-based network is not exclusively devoted to the video surveillance system, but is shared with other network-based application, such as email, web browsing, database systems, and the like. In the case where the video surveillance system employs conventional video analytics performed by the central processing unit, each camera must provide a raw video image stream to the central processing unit. This places an enormous amount of video data on the network with large amounts of data traffic, requiring bandwidth that might otherwise be used by other network application.