High Efficiency Video Coding (HEVC) is a video compression standard. In HEVC the basic processing unit is called a coding tree unit (CTU). It can be as large as 64×64 luma samples. A CTU can be split into multiple coding units (CU) in a quad-tree fashion. The CU is the basic unit for forming a motion compensation prediction. The CU can be coded as either an intra-picture prediction (intra) or an inter-picture predication (inter). In the intra case, one or more prediction angles are coded along with each CU. In the inter case, one or more motion vectors are coded along with each CU.
The majority of the coding efficiency gains of HEVC can be attributed to a flexible block partitioning structure. Therefore, selecting block sizes is very important for good coding efficiency. Extensively checking every possible block size requires a very significant amount of computations. Therefore, such an approach is not plausible for encoders with limited computational requirements, such as real-time encoders. Therefore, it is desirable to develop a block partitioning method with low complexity and high coding efficiency.
Some of the coding efficiency in HEVC comes from the use of data correlations. This introduces serial dependencies and is ill suited for massive parallel processing. Accordingly, it would be desirable to develop a method that is suited for massive parallel processing where only limited sections of the overall encoding algorithm need to be processed serially.
Efficient encoders rely on rate-distortion metrics to make decisions such as block size determination. Given a number of choices for a block size, an encoder estimates rate and distortion for each choice. The rate is generally expressed as a number of bits needed to encode the choice. The distortion can be expressed as a sum of squared differences between a block to be coded and its reconstructed version after compression. While it is possible to compute exact numbers when estimating rate and distortion, such an approach is impractical in most scenarios, in particular for real-time encoders. This impracticality stems from the high computational complexity required to compute the estimates. In practice, computationally efficient approximations are used. A rate estimate R and a distortion estimate D are typically combined into a single rate-distortion cost estimate C using a linear combination such as λR+D where λ is a weighting factor called a Lagrange multiplier. This weighting factor reflects the desired balance between rate and distortion and may be adjusted according to a target bit rate. Once costs are determined for each of the choices available to the encoder, the one with the lowest cost is picked. This process is commonly referred to as rate-distortion optimization (RDO). There is a need for improved approaches to RDO.