The present invention relates generally to a system and method for intelligent processing of visual data for low bandwidth communication. More specifically, the present invention relates to a client-server system for remote and distributed processing of image data over a communication network using an adaptive resolution protocol that enables image data to be transmitted with high resolution at a virtually constant low bit rate over the network.
The autonomous processing of visual information for the purpose of efficient description and transmission of the main events and data of interest, represents the new challenge for next generation video surveillance systems. The advances in the new generation of intelligent cameras having local processing capabilities (either supporting Java applications and/or based on DSP chips), will make possible the customization of such devices for various applications requiring specific video understanding and summarization tasks. Such applications may require mechanisms for efficient encoding and transmission of video data in a distributed environment. Indeed, protocols for providing low bit rate transmission of image data will allow, for example, wireless transmission of data of interest to a central processing unit, for further processing, and/or retransmission, in system implementations such as video surveillance, videoconferencing and industrial monitoring.
Accordingly, systems and methods that enable bandwidth reduction for efficient transmission of video data over a communication network are highly desirable.
The present invention is directed to client-server architecture and protocol that enables efficient transmission of visual information in real-time from a network of image servers (e.g., active cameras) to a central processing unit (client application). The client-server system may advantageously be employed in any application requiring efficient real-time processing and transmission of video data in a very low bit rate. For example, the invention may be used for remote and distributed surveillance applications, videoconferencing, or industrial inspection systems, etc.
In one aspect of the invention, a method for encoding image data for transmission over a communication channel comprises the steps of:
receiving image data;
encoding the image data using an adaptive log-polar mapping protocol that generates a log-polar representation of the image data comprising a fovea region and periphery region, wherein the encoding comprises selecting encoding parameters for the log-polar mapping based on either the size of the fovea, the channel bandwidth, or both, to modify the resolution of the image data within the periphery region; and
transmitting the image data within the fovea region at full resolution and the image data within the periphery region at the modified resolution.
In another aspect, the log-polar mapping parameters are dynamically modified, in real-time, to adjust the transmission resolution of the image data within the periphery region, if necessary, to compensate for a bit rate variation due to either a change in the size of the fovea region, the bandwidth of the communication channel or both, so as to maintain the transmission bandwidth of the encoded image data at a substantially constant rate.
Preferably, the encoding process utilizes a log-polar sampling grid comprising a hexagonal lattice framework.
In another aspect of the invention, the step of selecting encoding parameters comprises accessing predetermined encoding parameters stored in a LUT (look-up table) based on a radius measure of the fovea region.
In yet another aspect of the invention, a method for providing distributed surveillance over a communications network comprises the steps of:
detecting the presence of an individual in a predetermined field of view;
tracking the face of the individual within the field of view;
generating image data, the image data comprising two-dimensional coordinates and estimated scale of the individual""s face being tracked;
filtering and sampling the image data using a log-polar mapping to generate encoded image data comprising a fovea region and periphery region, the fovea region being centered on the face of the individual; and
transmitting the encoded image data over a communication channel at a predetermined transmission bit rate, the fovea region being transmitted at full resolution;
wherein the log-polar mapping of the periphery region is adapted based on scale and locations changes of the fovea region to substantially maintain the predetermined transmission bit rate.
In another aspect, a client/server system comprises an image sever which is operatively interfaced to a camera. The image server comprises an encoder for filtering and sampling image data received from the camera using a log-polar mapping to generate encoded image data comprising a fovea region and periphery region, the fovea region being centered on a target of interest in the image. The image server further comprises a communication stack for transmitting the encoded image data over a communication channel at a predetermined bit rate, wherein the fovea region is transmitted at full resolution, and wherein the log-polar mapping of the periphery region is adapted based on scale and locations changes of the fovea region to substantially maintain the predetermined transmission bit rate. The system further comprises a client for receiving the encoded image data transmitted from the image server and decoding the encoded image data for identification of the target of interest.
These and other objects, features and advantages of the present invention will be described or become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.