Temporal sequences of images acquired and represented in numerical format are commonly called digital video signals. In today's information society, the compression of video signals is a necessary operation to enable and/or to facilitate the recording of such signals or their distant communication. The compression in fact is applied to reduce the number of recording unit (commonly bit) for the digital representation of the video signal, which in turns lends to a reduction of the bit-rate that is necessary for the transmission of the same signal on a digital communication channel.
The compression or coding system receives an incoming video signal and returns a bit stream of an inferior size with respect to the size (expressed in bits) of the original signal. It is typically possible “to trade” a smaller compressed bit-stream at the expense of a lower reconstruction quality of the data after the inverse operation, called decompression or better decoding.
The coding is carried out, by who holds the original signal, by means of a system said encoder, while decoding is performed by the receiver by means of a system called decoder. Normally the original video signal has spatial (coordinates x and y) and temporal (rhythm of images per second, or frame rate) characteristics that are not changed by the coding and decoding process that operates at a so-called working point defined by an appropriate quality of the decoded data. The most spread standards for video compression work according to this paradigm, as an example those pertaining to the MPEG or the H.26x families.
Such a paradigm becomes obsolete in the context of Scalable Video Coding (SVC). In such a context, the coded data with respect to a certain working point can be decoded according to a not necessarily defined a priori large number of working points so called inferior, in the sense that they can be decoded from only a fraction of the originally coded bit-stream. Such working points allow not only to reconstruct (to decode) the video signal with a scaled, reduced quality, spatial dimension (or resolution) and frame rate with respect to the signal that can be decoded from the whole compressed bit-stream.
FIG. 1 shows a typical SVC system. The example refers to the coding of a video signal at an original CIF spatial resolution and a rate of 30 fps (images or “frames” per second). The scalable encoder typically produces a bit-stream where it is possible to determine one or more portions referring to the texture information, typically corresponding to static image coding, and optionally one or more portions referring to motion information (typically represented by a motion vector field) used in coding of motion-compensated temporal prediction operations. In the example, the originally coded stream is generated according to a bit rate equal to 2 megabits per second (2 Mbps), such rate being linked to a maximum layer of quality chosen for the original spatial and temporal resolutions. For a scaled decoding in terms of spatial and/or temporal and/or quality resolution, the decoder only works on a portion of the original coded bit stream according to the indication of the desired working point. Such stream portion is extracted from the originally coded stream by a block called “extractor” which in FIG. 1 is arranged between the encoder and the decoder and which in general, according to the application filed, can represent an independent block of the whole chain or it can be an integral part of the encoder or decoder. The extractor receives the information referring to the desired working point (in the example of FIG. 1, a lower spatial resolution QCIF, a lower frame rate (15 fps), and a lower bit rate (quality) (150 kilobit per second) and extracts a decodable bit stream matching or almost matching the specifications of the indicated working point. The difference between an SVC system and a transcoding system is the low complexity of the extraction block that does not require coding/decoding operations and that typically consists in simple “cut and paste” operations.
The application scenarios that can benefit from SVC are numerous, as an example the production and distribution of video over communication channels of diverse capacity, and for receiving terminal devices with different spatial or temporal resolution capability (television, cellular videophones, palm-desktop . . . ); video streaming on heterogeneous IP (Internet Protocol) networks; advanced tele-surveillance systems; videoconferencing applications with non guaranteed bandwidth; video streaming on mobile networks; fast video archives query and retrieval; and others.
Recently a strong interest has been focused around SVC coding solutions also thanks to important technological advances that enable spatial as well as temporal scalability, the most important one likely being both spatial and temporal wavelet transform, this last one in its motion compensated version.
Within the scientific community and above all as a result of an explorative work made within working groups operating in the standardization organization ISO-MPEG, it has been possible to reach a classification of SVC systems according to the order in which the aforesaid transformations are applied to the original data. Scope of the present invention is that to propose a method of coding and of decoding of video signals that allows overcoming some limits of state-of-the-art SVC architectures and at the same time to give competitive if not improved coding performances with respect to the current state-of-the-art in video coding.