With obvious advantages over previous video compression standards in aspects of compression efficiency and network adaptability, the H.264/AVC video coding standard quickly becomes a mainstream standard in the field of video applications since a draft thereof is released in May, 2003. However, with the diversification of forms of terminal devices and the continuous increase of requirements of people on multimedia experience, high definition, high frame rate, 3D, and mobile platforms have already become main trends of video applications. In another aspect, transmission bandwidths and storage space have always been the most crucial resources in video applications, and how to acquire the best video experience with limited space and transmission paths has always been a goal unremittingly pursued by users. Compression efficiency of the existing H.264/AVC coding standard still cannot satisfy these increasing requirements. Therefore, in January, 2010, the video Coding Experts Group (ITU-T VCEG) and the Moving Picture Experts Group (ISO/IEC MPEG) jointly established a Joint Collaborative Team on Video Coding (JCT-VC), who jointly formulated the next-generation coding standard High Efficiency Video Coding (HEVC) and officially released a final edition of the standard in January, 2013. The HEVC still uses a hybrid coding framework of the H.264/AVC, and also uses a great number of new technologies, so that coding efficiency is increased to twice that of the existing H.264/AVC, that is, the HEVC can achieve the same video quality as the H.264/AVC with a bit rate only half that of the H.264/AVC. Therefore, the HEVC has great application value in aspects such as high-definition and ultra-high-definition video storage, streaming media, and mobile internet videos.
One of the most important new technologies in the HEVC standard is the adoption of a more flexible quadtree coding structure, and an entire coding process is described by using three conceptions: coding unit (CU), prediction unit (PU), and transform unit (TU), so as to improve compression coding efficiency of high-definition and ultra-high-definition videos.
In the HEVC, a frame picture is divided into many non-overlapping coding tree units (CTU). The CTU is similar to a macroblock in the H.264/AVC. All the CTUs are square pixel blocks having a size of 2N×2N (where N=2C, and C is an integer greater than 1), and an allowed maximum size of the CTU is 64×64. Each CTU can be recursively divided into square coding units according to the quadtree structure. The CU is a basic unit for HEVC coding, an allowed minimum size is 8×8, and a maximum size is a size of the CTU. FIG. 1 shows an example of division of a 64×64 CTU quadtree structure. A depth (Depth) of a CU whose size is equal to that of a CTU is marked as 0. If a CU having a depth n can be further divided into 4 coding subunits having a depth equal to n+1, a size of each coding subunit is ¼ that of the CU having the previous depth. There are two predictive coding types of CUs: an intra-frame (Intra) prediction mode and an inter-frame (Inter) prediction mode. A frame whose CUs are all in an intra-frame prediction mode is referred to as an intra-frame predictive frame (that is, an I frame), and a frame including both a CU in an intra-frame prediction mode and a CU in an inter-frame prediction mode is referred to as an inter-frame predictive frame (that is, a GPB frame or a B frame).
A PU is a basic unit for prediction. One CU may include one or multiple Pus. A maximum size of the PU is a size of the CU, and the PU may be a square or rectangular block. For a CU of inter-frame predictive coding, there are 8 PU division manners shown in FIG. 2a to FIG. 2h. FIG. 2a to FIG. 2d show 4 symmetric division manners, which are respectively 2N×2N, 2N×N, N×2N, and N×N. FIG. 2e to FIG. 2h show 4 asymmetric division manners, which are respectively 2N×nU, 2N×nD, nL×2N, and nR×2N.
For the 2N×2N PU division manner for inter-frame prediction, if both a residual coefficient and a motion vector difference are zero, a coding mode of the CU is referred to as a skip mode. Different from a skip mode of the H.264/AVC, in the skip mode of the HEVC, a motion vector is acquired by using a motion merge technology (Merge), that is, for all motion information (including a motion vector, a reference frame index, and a reference frame list) of a current PU, a merge motion vector candidate list can be constructed according to motion information of adjacent PUs, and during coding, only a merge flag (Merge Flag) and an index (Merge Index) corresponding to an optimum merge motion vector need to be transmitted, and no other motion information needs to be transmitted. If a motion vector of a current PU is acquired by using the motion merge technology but includes a nonzero residual coefficient, a coding mode of the PU is referred to as a merge (Merge) mode.
For other cases of inter-frame prediction, the HEVC uses an adaptive motion vector prediction (AMVP) technology, that is, a prediction motion vector candidate list is constructed according to motion vector information of adjacent PUs, an optimum prediction motion vector is selected, then an optimum motion vector is selected through motion estimation, and a residual coefficient and complete motion information including a motion vector difference need to be transmitted during coding.
For a CU of intra-frame predictive coding, there are only two PU division manners: 2N×2N and N×N. The N×N division manner is only used for a CU whose depth is an allowed maximum depth.
A TU is a basic unit for transformation and quantizing. One CU may include one or more TUs. The TU also uses the quadtree recursive division structure. A size of the TU ranges from 4×4 to 32×32, and may be greater than that of the PU but does not exceed a size of the CU. FIG. 3 is a schematic diagram of a TU division manner of a CU.
During actual coding, mode selection needs to be performed for each CTU, to select optimum CU, PU, and TU division manners and a predictive coding type. Generally, according to the principle of rate-distortion optimization, for each CU division manner and PU division manner, overheads of intra-frame predictive coding performed in different TU division manners in at most 34 prediction directions need to be calculated; motion estimation is performed for each motion vector prediction manner to select the most matching predictive CU; then overheads of inter-frame predictive coding performed in different TU division manners are calculated; CU, PU, and TU division manners having a smallest cost are finally selected; and a corresponding predictive coding type is used as an optimum coding mode of a current CTU.
The HEVC standard uses the foregoing flexible CU, PU, and TU division manners and more intra-frame prediction directions and inter-frame motion vector prediction manners, which greatly improves prediction accuracy, thereby improving coding efficiency. However, because motion estimation for mode selection and coding overhead calculation involve a great number of highly complex calculation processes such as estimations of a sum of absolute differences (SAD), a sum of absolute transformed differences (SATD), a sum of squared errors (SSE), and a bit rate. Such flexible and diverse division manners and prediction manners of the HEVC greatly increase calculation complexity of a mode selection process. During current implementation of HEVC reference software, time consumed by calculation for mode selection takes up more than 90% of entire coding time. Such highly complex mode selection directly leads to great coding complexity of the HEVC, and therefore cannot satisfy an increasing number of applications of video coding with high requirements on real-time quality such as real-time video calling (especially video calling on a handheld device) and live media streaming. Moreover, offline video compression of a program source of a demanded video also requires a lot of server computing resources and coding time costs. Besides, coding efficiency of an inter-frame predictive frame is far greater than that of an intra-frame predictive frame. Therefore, during ordinary video coding, in order to ensure robustness of transmission and random access to a bitstream, one I frame is generally inserted every 1 to 2 seconds, that is, if a frame rate of a video is 15 fps, there is one I frame among each 15 to 30 frames and the other frames are inter-frame predictive frames. In applications such as high-definition video storage and streaming media, in order to ensure high compression efficiency, an interval between I frames is greater and may reach a maximum of 100 to 200 frames. Therefore, inter-frame predictive frames generally take up a great percentage in a video bitstream, and inter-frame mode selection is also a bottleneck during entire time consumed by video coding.
FIG. 4 is a flowchart of complete mode selection (where no quick mode selection algorithm is started) for coding a CU in the existing technology. As shown in FIG. 4, in this method, mode selection mainly includes the following steps S41 to S45:
S41: For a current CU, determine whether a depth (Depth) of the CU exceeds an allowed maximum depth; and if the depth exceeds the allowed maximum depth, stop this process; otherwise, continue to perform step S42.
S42: Sequentially calculate costs of coding performed according to merge 2N×2N, inter-frame 2N×2N, inter-frame N×N, inter-frame N×2N, inter-frame 2N×N, intra-frame 2N×2N, and intra-frame N×N modes; and when asymmetric PU division manners are allowed, further sequentially calculate costs of coding performed according to four modes: inter-frame 2N×nU, inter-frame 2N×nD, inter-frame nL×2N, and inter-frame nR×2N, where calculation for the inter-frame N×N and intra-frame N×N modes is performed only when the depth of the current CU is equal to the allowed maximum depth, and if an inter-frame flag_4×4_enabled_flag is 0, when a size of the CU is 8×8, a cost of the inter-frame N×N mode is not calculated either.
S43: Select a coding mode having a smallest cost among the modes calculated in step S42 as an optimum coding mode of the current CU, and record the smallest cost as an coding overhead of the current CU.
S44: Divide the current CU into 4 coding subunits having a depth Depth+1, and recursively invoke this process for each coding subunit.
S45: Add coding overheads of the 4 coding subunits having the depth Depth+1 (shown in FIG. 5), and compare a sum of the coding overheads with the coding overhead of the current CU; and if the coding overhead of the current CU is larger, an optimum coding mode of the current CU is a coding mode, which is optimum after the CU is divided into the coding subunits; otherwise, an optimum coding mode of the current CU is the coding mode, which is optimum before the CU is divided into the coding subunits.
In the foregoing mode selection method, for a mode selection process of each CTU, costs of coding performed according to each CU, PU, and TU division manner and intra-frame and inter-frame prediction manner need to be calculated, and a coding mode having a smallest cost is selected as an optimum coding mode of a current CTU. Although the optimum coding mode obtained by using the mode selection method is accurate, but calculation complexity is high. An ordinary offline video compression application cannot bear such a long time of compression and such a significant number of overheads of server computing resources, much less satisfy applications requiring real-time video coding such as video calling and live media streaming.
FIG. 6 is a flowchart of another mode selection method in the existing technology. As shown in FIG. 6, in this method, mode selection mainly includes the following steps S61 to S68:
S61: For a current CU, determine whether a depth (Depth) of the CU exceeds an allowed maximum depth; and if the depth exceeds the allowed maximum depth, stop this process; otherwise, continue to perform step S62.
S62: Sequentially calculate costs of coding performed according to inter-frame 2N×2N and merge 2N×2N modes.
S63: Determine whether both a residual coefficient and a motion vector difference when coding is performed according to a mode having a smallest cost among the inter-frame 2N×2N and merge 2N×2N modes are both zero; and if yes, predetermine that a coding mode of the current CU is a skip mode, and stop this process; otherwise, continue to perform step S64.
S64: Sequentially calculate costs of coding performed according to inter-frame N×N, inter-frame N×2N, inter-frame 2N×N, intra-frame 2N×2N, and intra-frame N×N modes; and when asymmetric PU division manners are allowed, further sequentially calculate costs of coding performed according to four modes: inter-frame 2N×nU, inter-frame 2N×nD, inter-frame nL×2N, and inter-frame nR×2N, where calculation for the inter-frame N×N and intra-frame N×N modes is performed only when the depth of the current CU is equal to the allowed maximum depth, and if an inter-frame flag_4×4_enabled_flag is 0, when a size of the CU is 8×8, a cost of the inter-frame N×N mode is not calculated either.
S65: Select a coding mode having a smallest cost among the modes calculated in step S64 as an optimum coding mode of the current CU.
S66: If the optimum coding mode of the current CU is the skip mode, stop this process; otherwise, continue to perform step S67.
S67: Divide the current CU into 4 coding subunits having a depth equal to Depth+1, and recursively invoke this process for each coding subunit.
S68: Add coding overheads of the 4 coding subunits having the depth Depth+1, and compare a sum of the coding overheads with the coding overhead of the current CU; and if the coding overhead of the current CU is larger, an optimum coding mode of the current CU is a coding mode, which is optimum after the CU is divided into the coding subunits; otherwise, an optimum coding mode of the current CU is the coding mode, which is optimum before the CU is divided into the coding subunits.
When the selection method described in step S61 to step S68 is compared with the selection method described in step S41 to step S45, the former provides a quick mode selection algorithm in which a CU division manner is decided in advance, where if an optimum coding mode of a current CU is a skip mode, the current CU is not further divided into coding subunits, and the mode selection process of the CU is stopped. The former also provides a quick mode selection algorithm in which the skip mode is detected in advance, where a mode having a smallest cost when coding is performed according to the inter-frame 2N×2N and merge 2N×2N modes is selected, and if both a residual coefficient and a motion vector difference of the mode are zero, it can be predetermined that an optimum coding mode of the current CU is the skip mode, and the mode selection process of the CU is stopped.
Such a quick mode selection method in which a skip mode is detected in advance and division of a CU in the skip mode into coding subunits is stopped in advance can reduce calculation complexity to some degree in a video scenario in which CUs in the skip mode take up a large percentage and a picture is relatively static. However, in an ordinary video scenario in which a picture moves to some degree, because calculation of a cost of coding performed according to the inter-frame 2N×2N mode involves highly complex processes of motion estimation and coding overhead calculation, the mode selection method still has high calculation complexity and falls far short of requirements on coding complexity of actual applications.
For the problem of high calculation complexity of mode selection for video coding in related technologies, no effective solution is provided yet currently.