Many techniques are known in the art to deal with encoding and decoding of digital signals. This is the case of audio signals, pictures, video signals and other multidimensional signals like volumetric signals used in scientific and medical areas. In order to achieve high compression ratios, those techniques exploit the spatial and time correlation inside the signal.
Conventional methods identify a reference and try to determine the difference of the signal between a current location and the given reference. This is done both in the spatial domain, where the reference is a portion of already received and decoded spatial plane, and in the time domain, where a single instance in time of the signal (e.g., a video frame in a sequence of frames) is taken as a reference for a certain duration. This is the case, for example, of MPEG-family (Moving Pictures Expert Group) compression algorithms, where previously-decoded macro blocks are taken as reference in the spatial domain and I-frames and P-frames are used as reference in the time domain.
Known techniques exploit spatial correlation and time correlation in many ways, adopting several different techniques in order to identify, simplify, encode and transmit differences (“residual data”).
In accordance with conventional methods, in order to leverage on spatial correlation, a domain transformation is performed (for example into a frequency domain) and then lossy deletion and quantization of information is performed. In the time domain, instead, conventional methods transmit the quantized difference between the current sample and a reference sample. In order to maximize the similarity between samples, encoders try to estimate the modifications along time occurred vs. the reference signal. This is called, in conventional encoding methods (e.g., MPEG family technologies, VP8, etc.), motion estimation. Motion information is transmitted to a corresponding decoder in order to enable reconstruction of the current sample by leveraging information already available at the decoder for the reference sample (in MPEG this is done using motion vectors on a macro block basis).
A drawback of conventional reference-based encoding techniques is that errors in a sample cumulate with errors in the following samples that are each reconstructed based on a precedent sample, creating visible artifacts after very few sequentially predicted samples as soon as lossy compression techniques are adopted.
A common approach is to partition the signal to reduce the complexity of the encoding and decoding operations. This is done both in the spatial domain, for example using the concept of macro blocks or slices, and in the time domain, where the current state of the art approach is to use a Group of Pictures (GOP) division along time.
The partition process is usually abrupt and prone to artifacts. An example is the macro block segmentation performed in MPEG methods. While compression efficiency is indeed achieved, it is also true that visible artifacts are introduced. These artifacts are, in many conditions, very evident to human observers due to the fact that they are unrelated to the original signal (e.g., two notable examples are “block” type artifacts and ringing artifacts). Many attempts to reduce the visibility of such artifacts have been implemented (e.g., de-blocking and de-ringing filters both on the encoder and the decoder side) although with disadvantages such as a reduced global perceived quality and an increased complexity.
Along the time dimension, conventional methods divide the samples in chunks (e.g., GOP for video signals, where each sample along time is a picture). A reference sample is chosen (the reference image), normally on the basis of sequential order, and the samples in the chunk are transmitted differentially with respect to the reference (or with respect to two references, in the case of bi-predicted frames). This introduces artifacts in the time evolution of the signal (e.g., for significant movements the quality perceived often suffers from evident discontinuities along the edges of the GOP).
One requirement addressed by methods in the known art is compression efficiency. Computational complexity has always been considered as a second priority: essentially, algorithms just had to be computationally feasible, rather than being designed for low computation complexity. This forced hardware manufacturers to continuously adapt to evolving techniques, designing specific processors and dedicated hardware solutions capable to implement the chosen algorithms. An example is the evolution of hardware support for MPEG2, MPEG4, H.264/AVC, H.265/HEVC, etc. No encoding technology so far was designed so as to be optimally executed on massively parallel hardware, with computational performance that scales automatically based on the number of computing cores available (i.e., without having to adapt the code, or without even knowing in advance how many computing cores will be available). This feature, unfeasible with current methods, is especially important since nowadays hardware technology is reaching the asymptotic limit of silicon in terms of computing clock rates and transfer rate speed: the current trend to increase the available computing power is moving in the direction of increasing the number of distinct processing units (“computing cores”) hosted in a single chip or system.
Another aspect neglected in the known art, aside from few attempts, is the quality scalability requirement. A scalable encoding method would encode a single version of the compressed signal and enable the delivery to different levels of quality, for instance according to bandwidth availability, display resolution and decoder complexity. Scalability has been taken into consideration in known methods like MPEG-SVC and JPEG2000, with relatively poor adoption so far due to computational complexity and, generally speaking, their use of approaches essentially designed for non-scalable techniques.
Another aspect not addressed by known methods is symmetry. With conventional methods compression efficiency can be achieved at the expense of renouncing to useful functionalities like bidirectional (e.g., time reverse) play back and more in general random access to any sample in the signal (e.g., frame-by-frame editing for video signals). Prediction techniques, especially along the time dimension, prevent the decoder to receive, decode and present the signal in time reversal order. Prediction techniques adopted also affect the behaviour in very compressed or error-prone transmissions, due to accumulation of artifacts. Artifacts introduced by errors are visible, especially due to their duration in time.
The prediction-driven techniques adopted in the known art also introduce strong constraints for random access into a compressed stream. Operations like “seek” towards a sample in an arbitrary point, random access when “zapping” to a different signal bitstream (without having to wait for the start of the next time-chunk/GOP) are currently unfeasible. The time that a user has to wait when trying to access an arbitrary point is currently in strict trade off with compression efficiency. An example of this phenomenon is the GOP constraint in MPEG family methods: in order to allow for minimum time delay and for random access along time, a GOP of one sample (i.e., intra-only encoding) must be used.
Lastly, current methods are unsuitable for very high sample rates (e.g., very high frame rates for video signals), due to the amount of computational power and bandwidth that would be required. Several studies, for instance, demonstrated that all humans can easily appreciate quality differences of video signals up to 300 frames per second, but computational and bandwidth constraints currently make it extremely expensive to encode and transmit high quality video signals at more than 25-60 frames per second.