Video surveillance systems have evolved considerably in recent years. The digitization of contents and the increase in the computational capabilities of computers enable the real-time processing of video sequences for their interpretation. New systems have appeared in the last few years and are customarily referred to as intelligent video surveillance systems. These systems rely on image and/or video processing techniques making it possible for example to compare images, to detect motion, to detect a face or to recognize an object.
A conventional video surveillance system generally comprises the following elements:                at least one analog or digital camera for capturing a video sequence;        at least one remote server capable of processing the video sequences transmitted by the camera or cameras of the system;        at least one terminal making it possible to view and/or store the video sequences;        at least one memory area making it possible to store the video sequences.        
Until recently, the main task of the video camera or cameras was to capture and to compress the digital video stream before transmission to the remote server via a telecommunications network. The desire to make ever more effective use of the bandwidth of the transmission media on which these sequences travel and the objectives of reducing the cost of their storage very soon posed the question of video compression. Conventional compression algorithms make it possible to reduce the spatial redundancy and the temporal redundancy specific to a video sequence. These compression techniques make it possible to reduce the bit rate required for the transmission of a video stream across, for example, a telecommunications network. In existing video encoding systems, it is necessary to select a compression rate suited to the application and therefore to the service considered. Indeed, the more compressed the video stream and therefore the lower the bit rate, the more degraded may be the quality of the video such as perceived by the user of the service. It is consequently important to correctly choose the transmission bit rate for these streams. Numerous schemes for carrying out this bit rate allocation exist. The existing techniques make it possible to adapt the bit rate of the video streams to the bandwidth constraints of telecommunications networks.
As stressed previously, one of the key constituent elements of a video surveillance system is the remote server. Its role is customarily to carry out analyses on the video stream after decompression. These analyses, for example the identification of the mobile objects of a video stream, are traditionally carried out at the level of the remote server and not of the cameras since the latter require algorithmic tools capable of analyzing an uncompressed video stream. Indeed, the video stream is analyzed at the image pixel level, thereby requiring considerable resources in terms of computations and memory. It is for this reason that the analysis of the video streams was not until recently conducted by the cameras but remotely on a server possessing sufficient resources to decompress the streams and analyze them.
Today it is possible to conduct video sequence analyses in the compressed domain and therefore to reduce the computational and memory loads required for the analysis of a video stream. The benefit of this scheme is that it uses a part of the work performed by the video encoder and thus utilizes information available in the compressed domain such as, for example, the coefficients computed by applying the Discrete Cosine Transform (DCT) and the motion estimation vectors. This information must thereafter be analyzed. Indeed, the motion estimation vectors do not necessarily correspond to a real motion of an object in the video sequence but may be akin to noise. By using this scheme it is then possible, for example, to identify the areas of the image comprising mobile objects. The computational load becoming reasonable, the video cameras can take charge of the analysis of the video streams. Various steps are necessary for using this information to identify the mobile objects. An overview of the various work described in the patent proposal Optical flow estimation method (US2006/0188013A1) has made it possible to delimit five functions identified in the article Statistical motion vector analysis for tracking in compressed video stream by Marc Leny, Frangoise Prêteux and Didier Nicholson. These modules are illustrated in FIG. 1 and described hereinbelow:                the Low Resolution Decoder (LRD) makes it possible to reconstruct the entirety of a sequence at the resolution of the block, deleting on this scale the motion prediction;        the Motion Estimation vectors Generator (MEG) determines, for its part, vectors for the set of blocks that the coder has coded in “Intra” mode (within Intra or predicted images);        the Low Resolution Object Segmentation (LROS) relies, for its part, on an estimation of the background in the compressed domain by virtue of the sequences reconstructed by the LRD and therefore gives a first estimation of the mobile objects;        the Object Motion Filtering (OMF) uses the vectors output by the MEG to determine the mobile areas on the basis of the motion estimation;        a Cooperative Decision (CD) is established on the basis of these two segmentations, taking into account the specifics of each module depending on the type of image analyzed (Intra or predicted).        
The results of the analysis in the compressed domain allow the identification of areas containing mobile objects (FIG. 2), the generation of motion maps established on the basis of the motion estimation vectors (FIG. 3) and of confidence maps corresponding to the edges of the low resolution image (FIG. 4).
The main benefit of the analysis in the compressed domain pertains to the computation times which are considerably reduced relative to the conventional analysis tools. By relying on the work performed during video compression, analysis times are today from tenfold to twentyfold the real time (250 to 500 images processed per second) for 720×576 4:2:0 images.
In a video surveillance system comprising a significant number of video cameras communicating with a remote server by virtue of a telecommunications network, the available bandwidth specific to the dimensioning of the telecommunications network must be shared. A conventional video surveillance network architecture relies on an initial network dimensioning making it possible either to transport the streams coming from the whole set of video sensors or cameras simultaneously, or taking into account a periodic switching from one stream to another. It is then at the level of the surveillance room that the operator or powerful computational and analysis servers may request the visualization of a precise stream depending on the importance accorded.
In these conventional systems, it is frequently the case that video streams not comprising any relevant information are transmitted from the sensors to the processing servers. In this case, the use of the resources of the telecommunications network used is not optimized.