1. Field of the Invention
The present disclosure relates to video and image processing, and, more specifically, a device and methodology for processing videos and images more efficiently.
2. Description of the Related Art
Video analytics or video content analysis range from video motion detection and audio detection, to more advanced systems including camera tampering detection, people counting, detection of objects crossing lines or areas of interest, vehicle license plate recognition, segmentation of video into foreground and background portions, tracking objects, traffic analysis, meta data extraction, biometric applications, and facial recognition. Video analytics also makes surveillance systems more intelligent to reduce vast amounts of image data to manageable levels. Intelligent video surveillance systems can for example automatically analyze and tag surveillance video in real-time, detect suspicious activities, initiate video recording, activate alarms or other actions to alert operators or other personnel.
In surveillance applications, video analytics is often used to detect motion. Motion detection is a way of defining activity in a scene by analyzing image data, and may be performed on a surveillance camera's entire field of view or on a user-defined area of interest. Furthermore, a video surveillance system with motion detection capabilities is able to detect motion more reliably than a human operator, and is therefore able to free human operators from staring at multiple video monitors for long hours. Instead, the video surveillance system with motion detection capabilities is able to alert the operator using a visual indicator, an audio indicator or both when motion is detected. Such a surveillance system may also automatically focus a surveillance camera on the area where motion was detected to obtain a more detailed image.
As recognized by the present inventor, a network camera capable of performing video analytics would reduce the work load of a centralized image processing system, and conserve valuable network bandwidth. Such a network camera would allow true event-driven surveillance systems where detection of motion by the camera may trigger predefined automatic processes, such as adjusting temperature, activating alarms, locking/unlocking doors, etc.
However, to extract information from images, video analytics methods frequently employ several relatively simple operations that operate on large amounts of pixel data. Filtering is among the most common of these operations, and also among the most processing-intensive. Conventional filters and conventional video analytics use twice as many bits as the number of bits in an input pixel to arrive at an accurate result. For example, the filtering of an 8-bit pixel uses 16-bits, and demands large amounts of processing resources, effectively halving a processor's processing capacity. Thus, as recognized by the present inventor, conventional video analytics methods are ill-suited for embedded applications, such as video analytics processing by a network camera.
Further, general purpose processors employ a Single Instruction, Single Data (SISD) architecture, and only process one pixel at a time. As recognized by the present inventor, this results in inherently poor performance when general-purpose processors are used for video analytics. FIG. 1 illustrates a typical register found in a general-purpose processor. In this example the SISD register 32 includes thirty-two bits (0-31), all of which represents a single number (i.e. a pixel value). For example, an 8-bit pixel whose bit values are all 1's is represented in FIG. 1 at bit positions 7-0. Bit positions 31-8 are padded with zeros. Though, the pixel is only 8 bits, it uses a full 32-bit register, which is an inefficient use of the “work” potential of the processor.
Some modern processors have evolved Single Instruction, Multiple Data (SIMD) instructions to improve performance. FIG. 2 illustrates a SIMD register 40 including 32 bits (0-31). However, the SIMD register 40 is divided into four 8-bit fields (41-42). Each of the 8-bit fields (41-42) can be packed with a pixel such that the 32-bit SIMD register 40 can be loaded with four pixels, allowing all four pixels in the SIMD register 40 to be processed simultaneously. SIMD processing instructions exist, for example, in Intel MMX, SSE and SSE2, PowerPC's Altivec, ARM Media Extensions and MIPS DSP ASE.
The performance increase brought about by conventional SIMD processors is, however, directly related to the number of pixels that can be packed into each of its SIMD registers. As described above, filtering operations can place large demands on a processor, and even SIMD Processors will not acheive their computational capacity if the pixel values are not efficiently packed in an instruction register.
FIG. 3 illustrates a 16-bit SIMD register 51 containing pre-filtered pixel data 50 corresponding to a first 8-bit pixel 53 and a second 8-bit pixel 52. As shown, the 16-bit SIMD register is able to accommodate two 8-bit pixels. However, if the contents of register 51 are filtered with filter 55, the result is two 16-bit filtered pixels (56 and 59). After filtering, according to conventional processes each pixel occupies a full register because the amount of storage space allocated to allow for overflow is conventionally double that of a number of bits used to represent a pixel. Even if the resulting pixels (56, 59) are truncated to 8 bits, processing capacity is still halved because the pixels are conventionally expanded to 16 bits to perform the filtering. Once expanded, the pixels are processed one at a time since the registers are only 16 bits wide, thus causing a processor bottleneck.
The increased processor throughput demand is more pronounced in processors used for embedded applications, such as video analytics locally performed in a network camera, where processing capacity is often traded for low power consumption. Moreover, as recognized by the present inventor, SIMD instruction sets are CPU-specific, and conventional SIMD processors lack a full array of SIMD functions for video analytics.
SIMD instructions may also be emulated in software. A general-purpose SISD register 32 can be made to function as a SIMD register 40 with additional software instructions to account for carry bits. For example, the SISD register 32 illustrated in FIG. 1 can be packed with four 8-bit pixels. However, any computation performed on SISD register 32 will generate carry bits that will “bleed” from one pixel to the next, leading to possible errors. Conventionally, this is remedied with additional processing steps that account for any carry bits generated through computation, thus maintaining the partitioning of the pixels. Though not as efficient as a native SIMD processor, the benefits of processing multiple bits simultaneously outweigh the burden of the additional computation.
However, as in the case of a native SIMD processor, software emulated SIMD instructions, often referred to as SIMD Within A Register (SWAR), is also limited by the number of pixels that can be packed into a register, and therefore SWAR implemented SIMD instruction sets are also susceptible to a reduction in performance by processing-intensive video analytics functions, such as filtering operations.