The H.265/MPEG-H High Efficiency Video Coding (HEVC) standard allows dramatic bit-rate savings of about 50% for essentially the same subjective quality when compared to its predecessor H.264/MPEG-4 Advanced Video Coding (AVC) standard. Thereby, it efficiently tackles challenges posed on modern communication networks and storage media resulting from the dramatically increasing video bandwidth demands. The H.265/MPEG-H HEVC standard was designed to be applicable for almost all existing H.264/MPEG-4 AVC applications, while making a special emphasis on the High Definition (HD) and Ultra High-Definition (UHD) video content, since need for these formats is expected to increase significantly during the next years. However, the above-mentioned coding performance gain comes at the cost of tremendously increased computational complexity, mainly due to supporting a relatively large number of coding modes [1], [2].
During the HEVC encoding process, in addition to determining the quadtree block partitioning [3], [4], the mode decision process enables to determine whether a coding unit (CU) should be encoded by using intra- or inter-picture prediction techniques, thereby exploiting spatial or temporal redundancies, respectively. The decision process is commonly implemented by means of a rate-distortion optimization (RDO) technique. A cost function, as shown in Equation (1), has to be minimized, whereas the overall cost J is based on the bit rate cost R and on the distortion cost D, which are weighted by using a Lagrange multiplier A [5]. Therefore, in order to determine the best coding mode, the CU is usually encoded in a plurality of coding modes, which leads to a high computational burden at the encoder end.J=D+λR  (1)
Inter-picture prediction plays a crucial role in modern video coding applications due to its high potential to significantly improve coding efficiency. Temporal redundancy is removed by encoding a block in terms of a displacement to one or more reference blocks, which are located in prior encoded reference frames. The displacement information is encoded as a so-called motion vector (MV), which is identified by executing the motion estimation (ME) process. Generally, in recent video encoders, the ME process is usually regarded as a 3-step process, including the motion vector prediction (MVP), integer-pel accuracy search, and sub-pel refinement. Regarding the latter, similarly to H.264/MPEG-4 AVC, the HEVC standard allows to address the motion information on a quarter-pel precision level. Such sub-sample positions may be obtained by applying computationally costly interpolation methods, which are followed by a search around the position of the prior-determined integer-pel MV. However, the implementation of such methods is very challenging for real-time encoders, which may be useful to meet a predefined ultra low processing latency.
A comprehensive overview of the HEVC standard is provided in [6], and a detailed analysis of the coding efficiency can be found in [7]. In addition, a comparison to other recent video coding schemes is presented in [8] with a particular focus on low delay applications in [9]. Also, additional studies with regard to the HEVC decoding performance and complexity are shown in [10], while making a special emphasis on 4K resolution videos (i.e. the 3840×2160 resolution in terms of luma samples).
Further, with regard to video coding standards that employ inter-picture prediction, intensive research has been conducted for carrying out sub-optimal strategies in the field of integer-pel accuracy ME, thereby targeting an optimal trade-off between computational complexity and coding efficiency. Most of the proposed algorithms sub-sample a search space, while trying to avoid the risk of getting trapped in a local minimum, which does not correspond to the global minimum for a given cost function. A recent survey on fast block matching algorithms can be further found in [11].
In the following, several traditional search techniques regarding sub-pel ME are discussed: first, the conventional interpolation-and-search method is described, followed by a discussion of recent pattern-based approaches, and finally several error surface approximation techniques are outlined.
The traditional so called interpolation-and-search method for sub-pel refinement is presented in FIG. 8a,b. It relies on the common assumption that an optimal sub-pel position is located adjacent to the optimal full-pel position, which in turn gives rise to 48 possible quarter-pel positions. However, the corresponding search is typically divided in two steps as follows. First, the interpolation and search is performed on a half-pel accuracy level, as depicted in FIG. 8a. Second, this procedure is repeated on a quarter-pel accuracy level, while reducing the search space to the neighborhood of the previously selected half-pel position, as further shown in FIG. 8b. Consequently, this approach is limited to 8 half-pel positions and 8 quarter-pel positions. Also, if no better sub-pel position can be determined, the MV stays on a full-pel position.
In the light of the popularity of approaches that have been developed for integer-pel search, various pattern-based strategies have also been developed with regard to sub-pel refinement. However, while adopting the assumption that the optimal sub-pel position is adjacent to the selected integer-pel position, it is important to note that the search space has rather a limited size. Additional attempts have been made to sub-sample this search space by first trying to predict a restricted search space, and then to conduct a search within the predicted sub-pel space, which appears to be limited by definition. For example, in [12], the authors propose to reduce the number of search points (from 8 points to only 6) both for half-pel and quarter-pel search by first checking only 4 “near-located” samples, and subsequently 2 “far-located” samples, which in turn are positioned next to the best “near-located” sample. It should be noted that the authors of [12] treat the samples, which are adjacently located in horizontal and vertical directions, as “near-located” samples and those adjacently located in the diagonal directions as “far-located” samples. As a result, this approach clearly favors the “near-located” samples over the “far-located” samples by providing them a larger weight during the decision process. Another example is shown in [13], where the authors propose to limit the search to one quadrant. The corresponding quadrant is determined by checking 2 fixed samples. In addition, in [14], the search space is restricted to only a few points depending on the direction of the integer-pel accuracy MV. Further, in [15], a pattern application being dependent on the distortion distribution of surrounding full-pel positions is proposed.
Another type of techniques related to sub-pel refinement attempts to approximate the error surface on a sub-pel level around the selected integer-pel position. In [16], the authors propose three mathematical models for determining a sub-pel accuracy motion vector. This approach may use the computation of 9 integer-pel position errors, some of which can be already known (depending on the applied integer-pel search algorithm), while omitting the interpolation and search on a sub-pel level. While the authors present results at a half-pel accuracy level, it is obvious that this approach can easily be extended to quarter-pel accuracy. In addition, in [17], those models are applied by considering the obtained coding costs for the integer-pel positions that may be used rather than the distortions values. According to this approach, an early termination criteria is also applied based on the shape of the resulting surface, whenever no significant improvement through the sub-pel refinement is expected.