There is growing interest in using high resolution digital cameras or “mega-pixel imagers” in security and surveillance systems. Greater resolution improves accuracy in identifying people and objects and offers digital zoom for more detail. However, there is a practical limit to the volume of video data that can be transmitted from an imager chip, thus limiting access to video details and constraining mega-pixel cameras from delivering their true potential, i.e., all the video data they can capture. For example, the AV-3100, which is sold and manufactured by Arecont Vision of Altadena, Calif., is one of the fastest 3 mega-pixel cameras currently on the market, but the AV-3100 cannot operate at its full video transfer rate of 30 frames per second. Standard security surveillance cameras deliver images of about 0.4 mega-pixel resolution at a rate of about 30 frames per second. Even 2 mega-pixel resolution at 22 frames per second requires more than a 40 megabytes per second (Mbps) data transmission rate from an imager chip. Most users have difficulty processing streams of data at 40 Mbps. New communications and storage systems lack the bandwidth to handle data rates 5 to 10 times faster than previous generation data rates. When faster speeds become possible, surveillance systems become prohibitively expensive.
One technique for managing large volumes of data entails transferring from the imager chip a portion (e.g., one-eighth of the number of pixels or a VGA quantity of pixels) of the video information to a central location for analysis. This approach carries risk of loss upon transfer of important video data that would represent an event or behavior of interest but post-transfer analysis would not detect. To overcome bandwidth limitations, some companies such as Arecont Vision or CoVi Technologies post-process video information from an imager chip to compress the volume of data transmitted to the rest of a surveillance system. Some modern data compression schemes that conform to industry standards, such as MPEG-1, MPEG-2, and H.263, offer only content-agnostic, lossy data compression. Although most new compression standards (e.g., MPEG-4 and MPEG-4 AVC) allow for object level prioritization of bandwidth allocation and multiple alternatives for scalability (in the case of MPEG-4 SVC), they fail to consider the relative importance of the video content, i.e., different areas within a field of view, such as an area containing human figures v. an area containing a moving tree branch.
While traditional data compression schemes treat all bits of data as though they were of equal value, there is need for an intelligent spatial and temporal resolution allocation mechanism that selectively assigns value to spatio-temporal portions of video content. The resolution allocation mechanism should automatically assign a high value if the data contain relevant and useful subjects within a scene, or a low value if the data contain subjects that provide no useful information. There are currently some tools available for use in intelligently allocating spatio-temporal resolution. For instance, one such value assessment tool offered by video analytics manufacturers is capable of detecting the presence of human beings, vehicles, license plates, and other items of interest.