CPU (Central Processing Unit) efficiency matters both during encoding and decoding of a signal. Latest generation processors are becoming more and more parallel, with up to hundreds of simple cores on each single chip.
Unfortunately, by nature, traditional MPEG (Moving Pictures Expert Group) family codecs are structurally non-parallel. That stems from the fact that they are block-based, and each image block must be encoded and decoded sequentially, since to achieve efficient compression all blocks must be made to depend in some way on each other.
Via the introduction of so-called “slices” (basically, pieces of the image that are treated independently of one another, as if they were separate videos put one next to the other) into MPEG coding, the H.264 standard allows for processing of a few threads in parallel (typically 2 or 3 threads). Important algorithm elements such as de-blocking (i.e., a filter that “smoothes” the transitions among blocks to create a more uniform image) are typically global operations full of conditional instructions, which are unsuitable for applications including parallel CPUs.
Today's CPUs and GPUs (Graphics Processing Units) are typically very powerful; a single GPU can include several hundreds of computing cores to perform parallel processing of information. When using current technology, larger portions of an image can be stored in a processor cache for processing. The need to fragment images into a multitude of small blocks, which was a driving factor when MPEG was created, as processors from that era could only deal with very small chunks of video data at a time—and then only sequentially—no longer applies to modern CPUs and GPUs. Thus, a large portion of available processing power may go unused when implementing MPEG-like types of encoding/decoding, with blocking artifacts needlessly introduced into the signal.
Also, compared to what was current when MPEG was developed, modern day applications typically require much higher definition video encoding and much higher overall playback quality. In high-definition (HD), high-quality videos, there is a much larger difference between areas with low detail (potentially even out of focus) and areas with very fine detail. This makes the use of frequency-domain transforms such as those used in MPEG even more unsuitable for image processing and playback, since the range of relevant frequencies is getting much broader.
In addition, higher resolution images include a higher amount of camera noise and/or film grain, i.e., very detailed high-frequency pixel transitions that can be quite irrelevant for viewing and require many bits to encode.
Lastly, traditional codecs are ill-suited to perform efficiently with 3D or volumetric imaging, which is becoming more and more important in fields such as medical imaging, scientific imaging, etc.
Most target devices today support different playback resolutions and quality. So-called SVC (Scalable Video Coding), the current MPEG standard for scalability, has not been received favorably by the industry and shows little to non-existent adoption, because it is considered way too complex and somewhat bandwidth inefficient.
Moreover, encoded videos are plentiful; that is, a content provider typically doesn't have the time to customize encoder parameters and experiment with each specific video stream. Currently, content providers dislike that many encoding parameters must be manually tweaked (every time performing an encoding and checking the quality of results) in order to successfully encode a video.
As an alternative to MPEG standards for encoding/decoding, so-called image pyramids have been used for encoding/decoding purposes. For example, using Laplacian pyramids, conventional systems have created lower resolution images using Gaussian filters and then building the pyramid of the differences between the images obtained by upsampling with a rigidly programmed decoder back from the lower resolution levels to the original level.
Use of conventional Laplacian pyramid encoding has been abandoned. One deficiency of such transforms is that the authors were trying to avoid distortions/artifacts in the downsampled image, so they typically used Gaussian filtering, as it is the only type of filter that doesn't add any information of its own. However, the insurmountable problem with Gaussian filtering is that it introduces a blurring effect, such that when upscaling back to higher resolutions, there is a need for an inordinate amount of image correction information to reproduce the original image. In other words, upsampling with conventional filters results in jagged or blurry edges in a reconstructed image. The jagged or blurry edges need to be corrected using a substantial amount of residual data, making such an encoding technique undesirable for use in higher resolution applications.
One of the important components of any signal encoder is the operation currently referred to as “entropy coding”. In practice, once the encoding operations and transforms are performed with either lossless or lossy methods, the residuals (i.e., new information that couldn't be derived from data, such as a previous frame in a video signal, which is already available at the decoder) are essentially strings of numbers that must be transmitted, if possible, without any further loss or approximation and with the least possible amount of bits. The lossless data compression schemes through which strings of numbers can be transmitted with the least possible amount of bits are typically referred to as entropy coding. The concept of entropy in a string of numbers/symbols has to do with the intrinsic amount of information that the string of numbers/symbols contains: since not all of the numbers/symbols in the string are different, the more the string contains few symbols (ideally, just one) that are frequently repeated, the fewer bits are necessary to encode the string.
Several methodologies for entropy encoding exist in the literature. Sophisticated entropy coders (such as CABAC, the context adaptive entropy coder introduced with H.264) can reach excellent results at the expense of great computational complexity, while others, such as the technique known as range encoding, can reach similar results only when used with appropriate parameters. In general entropy coders are only as efficient as their estimate of the symbol frequencies in the strings to encode (i.e. of the probability distribution of the symbols, which the decoder must get from the encoder in some way).
Since MPEG-family codecs are block based (i.e., they divide the signal in a number of blocks and essentially analyze/encode each block separately), ideally they would need a separate probability distribution for the residuals of each single block: this of course wouldn't be practical given the very high number of blocks, so they either use standard distributions of probabilities (not custom made for a specific frame, and consequently less efficient in terms of data compression) or adaptive schemes like CABAC (more efficient, but very complex).
Methods and embodiments herein represent an innovative approach to achieve efficient entropy coding results with low computational complexity.