CPU (Central Processing Unit) efficiency matters both during encoding and decoding of a signal. Latest generation processors are becoming more and more parallel, with up to hundreds of simple cores on each single chip.
Unfortunately, by nature, traditional MPEG (Moving Pictures Expert Group) family codecs are structurally non-parallel. That stems from the fact that they are block-based, and each image block must be encoded and decoded sequentially, since to achieve efficient compression all blocks must be made to depend in some way on each other.
Via the introduction of so-called “slices” (basically, pieces of the image that are treated independently of one another, as if they were separate videos put one next to the other) into MPEG coding, the H.264 standard allows for processing of a few threads in parallel (typically 2 or 3 threads). Important algorithm elements such as de-blocking (i.e., a filter that “smoothes” the transitions among blocks to create a more uniform image) are typically global operations full of conditional instructions, which are unsuitable for applications including parallel CPUs.
Today's CPUs and GPUs (Graphics Processing Units) are typically very powerful; a single GPU can include several hundreds of computing cores to perform parallel processing of information. When using current technology, larger portions of an image can be stored in a processor cache for processing. The need to fragment images into a multitude of small blocks, which was a driving factor when MPEG was created, as processors from that era could only deal with very small chunks of video data at a time—and then only sequentially—no longer applies to modern CPUs and GPUs. Thus, a large portion of available processing power may go unused when implementing MPEG-like types of encoding/decoding, with blocking artifacts needlessly introduced into the signal.
Also, compared to what was current when MPEG was developed, modern day applications typically require much higher definition video encoding and much higher overall playback quality. In high-definition (HD), high-quality videos, there is a much larger difference between areas with low detail (potentially even out of focus) and areas with very fine detail. This makes the use of frequency-domain transforms such as those used in MPEG even more unsuitable for image processing and playback, since the range of relevant frequencies is getting much broader.
In addition, higher resolution images include a higher amount of camera noise and/or film grain, i.e., very detailed high-frequency pixel transitions that can be quite irrelevant for viewing and require many bits to encode.
Lastly, traditional codecs are ill-suited to perform efficiently with 3D or volumetric imaging, which is becoming more and more important in fields such as medical imaging, scientific imaging, etc.
Most target devices today support different playback resolutions and quality. So-called SVC (Scalable Video Coding), the current MPEG standard for scalability, has not been received favorably by the industry and shows little to non-existent adoption, because it is considered way too complex and somewhat bandwidth inefficient.
Moreover, encoded videos are plentiful; that is, a content provider typically doesn't have the time to customize encoder parameters and experiment with each specific video stream. Currently, content providers dislike that many encoding parameters must be manually tweaked (every time performing an encoding and checking the quality of results) in order to successfully encode a video.
As an alternative to MPEG standards for encoding/decoding, so-called image pyramids have been used for encoding/decoding purposes. For example, using Laplacian pyramids, conventional systems have created lower resolution images using Gaussian filters and then building the pyramid of the differences between the images obtained by upsampling with a rigidly programmed decoder back from the lower resolution levels to the original level.
Use of conventional Laplacian pyramid encoding has been abandoned. One deficiency of such transforms is that the authors were always trying to avoid distortions/artifacts in the downsampled image, so they always used Gaussian filtering, as it is the only type of filter that doesn't add any information of its own. However, the insurmountable problem with Gaussian filtering is that it introduces a blurring effect, such that when upscaling back to larger resolutions, there is a need for an inordinate amount of image correction information to reproduce the original image.