The synthesis of images makes it possible to create, with the aid of computer tools, so-called virtual images. They emanate from an abstract description and from digital calculations. This involves using a collection of methods via 2D and 3D graphics libraries, which are possibly accelerated by specific hardware circuits of the accelerator graphics cards type and via suitable interfaces of the API type (standing for xe2x80x9cApplication Program Interfacexe2x80x9d).
The process for creating these images can be split up into various steps.
It comprises firstly a phase of modelling, that is to say of computing or acquiring objects using a description model, the aim of which is to describe the constituent objects and assemble them so as to construct a viewable scene therefrom.
Let us cite for example the model of polygonal type in which the objects are split up into multitudes of elementary polygons or facets. Graphical primitives are utilized for defining, assembling or modifying these elementary geometrical entities.
These models are interpretable: they may be associated with kinds of graphics engines, for example colouring (or xe2x80x9cshadingxe2x80x9d) of triangles, anti-aliasing of texture, etc. They have capacities, that is to say properties or capabilities, which are both behavioural, such as motion, explosion, . . . , and also visual, such as texture, colour, mirror effects, etc. They will be able to interact with their environment when constructing a scenario, for example with lights, with the other objects. There is therefore, secondarily, a construction of a moving scene which governs the global organization of these models over time (here taken in the sense of the time to effect a given application), that is to say the definition of a scenario or animation.
Lastly, depending on the applications (CAD, production of images, simulation, etc.), the final step consists in creating digital images from this scenario. This last step is called rendering or the method of image xe2x80x9crenditionxe2x80x9d, the purpose of which is to render the scene as realistically as possible. It may be very expensive in terms of calculation time, and require large memory capacities both in respect of the models used and in respect of the data related to the programs involved. For example, rendition methods such as radiosity or ray tracing make it possible to obtain quality images, but at an even higher cost, the calculation algorithms implemented being very complex.
The volume of information represented by digital images has given rise to the development of various compression standards such as JPEG, H.263, MPEG-1, MPEG-2, and soon MPEG-4 making it possible to manipulate, whether for storage or transmission, volumes of information which are compatible with the present technology. The MPEG-2 standard, nowadays the more generic, makes it possible to compress all existing formats of images with the aid of various profiles and levels defined in the MPEG standard, the best known of which is MP@ML (xe2x80x9cMain Profile at Main Levelxe2x80x9d), for images in the conventional television format. The structure of the coders which carry out such video image compressions, according to the prior art, rely on various types of images: Intra, Predicted or Bidirectional (I, P and B, respectively), the main difference being the temporal mode of prediction. The coding kernel is conventional with a frequency splitting based on the DCT (xe2x80x9cDiscrete Cosine Transformxe2x80x9d), followed by quantization and entropy coding, so as to obtain, at the output of the coder, a binary train which must comply with the standard, that is to say a specific syntax.
Temporal prediction is performed by estimating the motion between images separated in time, on the basis of image blocks of size 16xc3x9716 pixels for example. The motion is deduced from a correlation between the block of the current image and a block of a search window of a previous or following image. Next, each block of size 8xc3x978 pixels of the image is predicted with the calculated displacement vector, and only the error between the estimate and the original is coded.
The compression of data, whether of conventional images or synthesis images, therefore utilizes the conventional processes such as motion estimation. The circuits which carry out such calculations and the associated circuits are complex and the cost of such a setup is high. For example, the motion estimation and motion-compensated interpolation circuits account for perhaps half the complexity of an MPEG-2 type coder.
The motion information, still according to conventional processes, does not always correspond to the actual motion. It simply involves correlations generally with regard to luminance information. The fact that the field of vectors consisting of the motion vectors of an image does not reflect the actual motion precludes optimal compression of data, in particular in the case of differential coding of vectors. This is because, for macroblocks corresponding to zones of uniform motion, the cost of transmitting identical or slightly different vectors, in differential coding, is smaller than the cost of transmitting random vectors.
Moreover, the fact that the motion vectors obtained according to the conventional xe2x80x9cblock matchingxe2x80x9d process do not necessarily reflect the actual motion precludes utilization of the vector field to carry out interpolations or extrapolations of images of good quality during for example conversions of frequency, of digital video recorder slow motion modes, etc.
An incorrect motion vector field also precludes the utilization of new techniques of coding using the contour information for an image rather than the macroblocks. This is because the compression of data according to these new techniques is based on image segmentation and the actual displacement of these xe2x80x9csegmentsxe2x80x9d defining the uniform zones.
Thus, the lack of reliability of the motion estimation precludes optimization of the performance of the coder in terms of degree of compression or of image quality for a given bit rate or the effective utilization of this motion information at the decoder.
The purpose of the invention is to alleviate the aforesaid drawbacks during coding of synthesis images.
To this end, its subject is a process for compressing digital data of a sequence of synthesis images describing a scene which is the subject of a script, comprising a processing step for modelling the scene on the basis of mathematical data, a step of image rendering for creating a synthesis image from this modelling and a partitioning of this synthesis image into image blocks, a differential coding of the current image block on the basis of a block of at least one synthesis image, this block being defined on the basis of at least one motion vector, so as to provide a residual block, characterized in that the motion vector is calculated from mathematical data emanating from the synthesis script and defining the apparent motion of the various objects constituting the scene which is the subject of the sequence.
Its subject is also a device for compressing digital data of a sequence of synthesis images describing a scene which is the subject of a script, comprising a processing circuit for modelling the scene, the images of which are to be synthesized on the basis of mathematical data, a circuit for image rendering and for partitioning the image into blocks which receives the cues from the processing circuit for effecting a synthesis image and partitioning the image obtained into image blocks, an image blocks motion compensation circuit receiving the cues from the processing circuit so as to provide predicted blocks, a subtractor for taking the difference between the current block originating from the circuit for image rendering and for partitioning into image blocks and the predicted block originating from the motion compensation circuit so as to provide a residual block, a discrete cosine transformation circuit for the image blocks originating from the circuit for image rendering and for partitioning into image blocks or residual blocks originating from the subtractor, the choice being made by a mode selection circuit as a function of energy criteria, a circuit for quantizing the transformed coefficients, characterized in that the motion compensation circuit utilizes the mathematical data provided by the processing circuit and representing the displacement of the modelled objects constituting the scene so as to calculate the motion vectors associated with the current block and defining the predicted block.
According to another embodiment, its subject is a device for compressing digital data of a sequence of synthesis images describing a scene which is the subject of a script, comprising a processing circuit for modelling the scene, the images of which are to be synthesized on the basis of mathematical data, a circuit for image rendering and for partitioning the image into blocks which receives the cues from the processing circuit, for effecting a synthesis image and partitioning the image obtained into image blocks, an image blocks motion compensation circuit receiving the cues from the processing circuit, characterized in that it transmits in intra mode one image from among N images of the sequence, N being a predetermined number, this image N being that which is the subject of the rendition calculation by the circuit for rendition calculation and for partitioning into image blocks, in that the other images are transmitted in inter mode by way of residual blocks representing the difference between a current block and a predicted block and in that residual blocks are null and defined by the single motion vector calculated from the mathematical data.
In general, the techniques of image rendering amount to representing an xe2x80x9cobjectxe2x80x9d-oriented scenario as images. Now, the script, that is to say the scenario, comprises all the possible information with regard to the objects in the scene and also their various properties. In the case of image synthesis, a 2D or 3D script gives the exact displacement of the objects over time. This script then serves to generate the final digital video images (rendering). Thus, instead of using the information consisting of the pixels making up a visual image, that is to say one which is not modelled, to estimate the motion, modelling tools are used to calculate the actual motion in the image sequence.
Apart from the reduction in complexity, by using the actual motion rather than the estimated motion it is possible to improve the quality of the prediction and the global performance of the coder.