High Efficiency Video Coding (HEVC) is a new coding standard that has been developed in recent years. In the High Efficiency Video Coding (HEVC) system, the fixed-size macroblock of H.264/AVC is replaced by a flexible block, named coding unit (CU). Pixels in the CU share the same coding parameters to improve coding efficiency. A CU may begin with a largest CU (LCU), which is also referred as coded tree unit (CTU) in HEVC. In addition to the concept of coding unit, the concept of prediction unit (PU) is also introduced in HEVC. Once the splitting of CU hierarchical tree is done, each leaf CU is further split into one or more prediction units (PUs) according to prediction type and PU partition.
In the current development of screen content coding for High Efficiency Video Coding (HEVC) standard, some tools have been adopted due to their improvements in coding efficiency for screen contents. For Intra blocks, Intra prediction according to the conventional approach is performed using prediction based on reconstructed pixels from neighboring blocks. Intra prediction may select an Intra Mode from a set of Intra Modes, which include a vertical mode, horizontal mode and various angular prediction modes. For HEVC screen content coding, a new Intra coding mode, named Intra-block copy (IntraBC) has been used. The IntraBC technique that was originally proposed by Budagavi in AHG8: Video coding using Intra motion compensation, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting: Incheon, KR, 18-26 Apr. 2013, Document: JCTVC-M0350 (hereinafter JCTVC-M0350). According to the original IntraBC method, the displacement vector between a current block and a reference block is restricted to be in the horizontal direction (i.e., one-dimensional (1D) IntraBC). The prediction block is obtained from the already reconstructed region. The displacement vector is also referred as Block Vector (BV). IntraBC relies on reconstructed reference data in the same picture to generate predictors for a current block. It is considered as Intra-prediction coding in the sense that the prediction is derived from the same picture as the picture being coded. However, the coding process for IntraBC is similar to that for Inter-frame coding. However, the reference data for IntraBC coding is based on the reconstructed samples in the current picture instead of the previously coded frames.
The IntraBC process adopted by HEVC allows the use of two-dimensional BV. The reference data includes previously coded blocks in the same frame as the block being coded. FIG. 1 illustrates an example of previous reconstructed region corresponding to reconstructed data before the deblocking process prior to a current encoding block according to HEVC. In order to encode a current block, a best candidate block in the previously reconstructed region is first identified. The best candidate block is identified by the block vector (BV 210), which points from the current block to the best candidate block as shown in FIG. 2. There are various ways to determine the best candidate block and the decision is made at the encoder side. For example, the encoder may select the best candidate block by minimizing the mean squared errors between the candidate block and the current block. The encoder may also select the best candidate block by achieving the optimal rate-distortion performance associated with using the candidate block as the predictor for the current block.
In HEVC, wavefront parallel processing (WPP) is supported, where each row of Coding Tree Units (CTUs) can be processed in parallel as sub-streams by multiple encoding or decoding threads. In order to limit the degradation of coding efficiency, a wavefront pattern of processing order ensures that dependencies on spatial neighbors are not changed. In order to be compliant to the WPP process, the valid previous reconstruction region is reduced as shown in FIG. 3, wherein each square corresponds to a CTU and the valid previous reconstructed region has a ladder shape. In the current HEVC reference software, the valid previous reconstruction region as shown in FIG. 3 is always used regardless of whether WPP coding or non-WPP coding is used. The valid previous reconstruction in FIG. 3 is also referred as the wave-front parallel process (WPP) format.
In hardware based implementation of video encoder incorporating IntraBC mode, pipeline architecture may be used to support the multiple encoding functions involved in IntraBC encoding. The multiple encoding functions may be mapped to the multi-stage pipeline processors (or multi-stage pipeline processing units). Each stage of IntraBC processing may be mapped to one stage pipeline processor (or one stage pipeline processing unit). An exemplary pipeline processing for IntraBC encoding is illustrated in FIG. 4, where the key encoding process is divided into three functional blocks corresponding to motion estimation/Intra block copy (IntraBC) block vector (BV) estimation 410, mode decision 420, and reconstruction/entropy encoding 430. Each of the three functional blocks can be mapped to a suitable processor in the pipeline architecture. As mentioned before, the processing of the IntraBC mode is similar to that of the Inter prediction mode. Therefore, the pipeline architecture in FIG. 4 can handle blocks coded in the Inter prediction mode or IntraBC mode. In the first stage, the encoder determines the motion vector for the Inter prediction mode and block vector for the IntraBC mode. In the second stage, the encoder determines a best mode according to a certain performance criterion. For example, the performance criterion may correspond to the best rate-distortion performance by using a rate-distortion optimization process. After the mode is selected, the encoder generates the compressed bitstream using entropy coding process. Since the reconstructed samples may be used as reference data for later encoding process, the coded data has to be reconstructed in the encoder side.
When an encoder incorporates the pipeline architecture, there may be data dependency between a current block and neighboring reconstructed blocks for IntraBC coding. The pipeline structure is efficient when all pipeline stages can work concurrently. The data dependency may affect the performance of the pipeline-based encoder implementation. Accordingly, it is desirable to develop methods and/or apparatus to overcome the issue.