One of the problems to be solved resides in the fact of being able to transmit high-resolution video sequences over constrained-bitrate networks. Indeed, a high-resolution video sequence, even after compression via a suitable source coding device exhibits a useful bitrate which often exceeds the capacity of the transmission channel, notably that of wireless networks. One solution consists in selecting and transmitting within the sequence only certain images at a lower resolution but the problems then arise of selecting relevant images in the sequence so as to transmit almost all the useful information contained in the video. Another problem to be solved relates to the transmission procedure to be implemented to transmit and recover, receiver side, the high-resolution images. Moreover, the implementation of a form of interactivity between the remote operator and the sender so as to select only a part of the video stream for transmission exhibits appreciable advantages so as, for example, to adapt the transmission to the requirements of the operator, and to thus transmit only the information deemed relevant. Finally, the concern over implementational complexity is an important point to be taken into account in achieving a global solution which satisfies, notably, the real-time constraints inherent in interactive multimedia applications.
Hereinafter in the text, the expression “relevant images” or “key images” will refer to a subset of selected images within a video sequence and exhibiting a greater priority from the point of view of the end user. In the context of a transmission of said video sequence on a low-bitrate network, the relevant images are, for example, those which exhibit a significant mutual difference in content. In the context of the compression of said video sequence by a suitable video coder, the key images are also those which will be compressed in the most effective manner so as to guarantee their final quality in terms of visual rendition, once decompressed. Accordingly, a summary of a video sequence corresponds to the set of “relevant images” or “key images” of said sequence.
Selecting Relevant Images in a Video Stream
The issue of selecting relevant images within a video sequence is often handled, in the prior art, by way of solutions which consist in creating a summary of said sequence by taking into account the global content of the video. For example, patent application US2008/0232687 describes a procedure making it possible to select key images within a video sequence. This procedure also allows the temporal segmentation of the sequence so as to produce several scenes. This type of method is not suited to the real-time broadcasting of a video stream since it requires the processing of the entire sequence to produce the set of associated key images. On the contrary, the constraints related to video transmission make it necessary to process the images on the fly, the procedure for selecting key images then benefits as point of entry only from the current image and optionally its temporally close neighbors, in particular the previous images if it is desired to minimize the transmission delay.
Video Stream Compression Techniques
A video sequence, by its very nature, comprises considerable statistical redundancy both in the temporal domain and the spatial domain. The wish to make ever more effective use of the passband of the transmission media over which these sequences travel and the objectives of reducing their storage cost, very soon raised the question of video compression. The conventional video compression techniques can generally be divided into two steps. The first is aimed at reducing spatial redundancy and therefore at compressing a still image. The image is firstly divided into blocks of pixels (4×4 or 8×8 according to the MPEG-1/2/4 standards), a switch to the frequency domain followed by quantization makes it possible to approximate or to remove the high frequencies to which the eye is less sensitive, and finally the quantized data are entropically coded. The aim of the second is to reduce the temporal redundancy. This technique makes it possible to predict an image on the basis of one or more other reference images(s) previously processed within the same sequence (motion prediction). This technique consists in searching through these reference images for the block to which it is desired to apply a prediction, and only a motion estimation vector corresponding to the displacement of the block between the two images, as well as a residual error making it possible to refine the visual renditions, are retained.
Temporal Granularity in a Video Standard
A stream of data compressed according to a procedure allowing temporal granularity, or “a temporally scalable compressed bit-stream”, follows a coding scheme of hierarchical type. This hierarchy in the coding scheme allows the definition of sets of images which are accessible by grade or temporal resolution. The first grade, called “base resolution”, is the minimum sequence allowing degradation-free reconstruction of the frames of which it is composed. The other grades correspond to refinements of this base sequence. Generally the refinement grades have frame frequencies which are multiples of that of the base frequency, the ratio between these frame frequencies is then called the scale factor. In an example of a sequence with 30 frames per second following a coding scheme with a temporal granularity of scale factor equal to two and graded in three levels, a first level of resolution (base resolution) would be obtained, corresponding to a video content at 7.5 frames per second. In this example, if the base subsets and also that of the first refinement level are accessible, then a video content with 15 frames per second is achievable. If the last refinement level is added, a video content with the original temporal resolution (30 frames per second) is achievable. Each of these subsets is assumed to correspond to effective compression of the information that it contains. FIG. 1 shows diagrammatically an example of temporal granularity. The base level (0) corresponds to the minimum temporal resolution which is transmitted. Within the framework of video transmission, the code stream corresponding to the base level represents the minimum information that must be received by the recipient and must therefore be compressed to ensure the fewest possible losses during transmission. Typically, the images contained in this base temporal resolution are encoded independently. The temporal resolutions of higher levels (refinement level 1 and 2) may be encoded by implementing a prediction with respect to the images of the base resolution (0). A prediction is possible between images belonging to the base resolution, on the other hand the images contained in the base temporal resolution may not be predicted on the basis of an image contained in another refinement level.
JPIP Standard
The JPIP standard (JPEG 2000 Interactive Protocol) defines a protocol dedicated to the progressive transmission of images in accordance with the JPEG 2000 standard. It makes it possible to exploit the various granularity levels proposed by JPEG 2000 (spatial granularity, granularity in terms of resolution, and in terms of quality). Indeed, subsequent to a request made by the operator, only the information necessary to satisfy this request is transmitted, doing so in a progressive manner in terms of quality. The use of the JPIP protocol combined with the JPEG 2000 standard makes it possible not to retransmit the already transmitted information. Thus the resources in terms of bitrate transmitted and complexity of processing of the two sides of the transmission chain are lightened. Moreover, the dispatching of the information being hierarchized, it is possible to rapidly view a part of the image with a low quality, said quality growing in a progressive manner as new information is received.
This standard may be used to perform interactive transmission with optimization of the bitrate in the case of a transmission of JPEG2000 images but does not make it possible to implement the same type of method for video transmission based on a different standard. In particular, the selection of an image by the operator within a video stream transmitted is not taken into account by this protocol.
The prior art, such as described previously, does not make it possible to solve a certain number of problems, notably high-resolution information transmission in a network exhibiting a constrained bitrate. Interactive access to an image, or to a zone of an image, within a video stream transmitted in real time is not covered by the state of the art either.
In order to deal with the limitations of the prior art, the invention proposes a new approach which consists in working only on images which are under-resolved and under-sampled temporally in an intelligent manner so as to reduce to the maximum the redundancy and to adapt to the required passband. The proposed solution also allows the analysis of this under-resolved sequence in an interactive manner via requests performed by a remote operator. The present invention is compatible with the following standards. The H.264 standard is defined by ISO/IEC standard 14496-10. The JPEG 2000 standard is defined by ISO/IEC standard 15444-1. The JPIP standard is defined by ISO/IEC standard 15444-9.