High Efficiency Video Coding (HEVC) is a new coding standard that has been developed in recent years. In the High Efficiency Video Coding (HEVC) system, the fixed-size macroblock of H.264/AVC is replaced by a flexible block, named coding unit (CU). Pixels in the CU share the same coding parameters to improve coding efficiency. A CU may begin with a largest CU (LCU), which is also referred as coded tree unit (CTU) in HEVC. In addition to the concept of coding unit, the concept of prediction unit (PU) is also introduced in HEVC. Once the splitting of CU hierarchical tree is done, each leaf CU is further split into one or more prediction units (PUs) according to prediction type and PU partition.
In the current development of screen content coding (SCC) for the High Efficiency Video Coding (HEVC) standard, a new Intra coding mode, named Intra Block Copy (IntraBC) has been disclosed. The IntraBC technique that was originally proposed by Budagavi in AHG8: Video coding using Intra motion compensation, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting: Incheon, KR, 18-26 Apr. 2013, Document: JCTVC-M0350 (hereinafter JCTVC-M0350). An example according to JCTVC-M0350 is shown in FIG. 1, where a current coding unit (CU, 110) is coded using Intra MC (motion compensation). The prediction block (120) is located from the current CU and a displacement vector (112). In this example, the search area is limited to the current CTU (coding tree unit), the left CTU and the left-left CTU. The prediction block is obtained from the already reconstructed region. Then, the displacement vector, also named block vector (BV), and residual for the current CU are coded. The HEVC adopts CTU and CU block structure as basic units for coding video data. Each picture is divided into CTUs and each CTU is reclusively divided into CUs. During prediction phase, each CU may be divided into multiple blocks, named prediction units (PUs) for performing prediction process. After prediction residue is formed for each CU, the residue associated with each CU is divided into multiple blocks, named transform units (TUs) to apply transform (such as discrete cosine transform (DCT)).
Since the initial IBC technique was introduced in JCTVC-M0350, variations of IBC and various improvements have been disclosed. In particular, while only the horizontal block vector is allowed in JCTVC-M0350, the current IBC uses two-dimensional block vectors. FIG. 2 illustrates an example of previous reconstructed region that can be used as reference data for encoding a current block according to IBC. In FIG. 2, the blocks in a current frame (210) are processed in a pre-defined order (e.g. horizontal scan). When the current block (220) is coded, the previous reconstructed blocks in the previous reconstructed region (230) can be used as an IntraBC predictor for the current block.
The IntraBC predictor (350) for the current block (320) is located according to the block vector (340) as shown in FIG. 3. The IntraBC predictor is treated as a reference block for the current block coded by IntraBC mode as if a reference block for an Inter-coded block. However, the reference block is located in the same picture as the current block while the reference block is in a previous reconstructed reference picture for Inter coding. The reference block is selected from the previous reconstructed region (330) of the current frame (310). The block vector points from the current block (320) to the reference block (350). In other words, the location of the reference block (350) is determined based on the location of the current block (320) offset by the block vector (340).
At the encoder side, the block vector is often determined by selecting a reference block in the previous reconstructed region that achieves an optimal performance. The performance can be in terms of BD-rate, which is widely used in video coding systems as a performance measure. After the block vector is determined, information related to the block vector is signaled in the bitstream so that information related to the block vector can be recovered at the decoder side for decoding the current block. According to the current HEVC standard, the previous reconstructed samples in the previous reconstructed region correspond to reconstructed pixels before the deblocking process.
A variation of IBC adopted by the HEVC standard restricts the previous reconstructed region to a ladder-shaped region (430) for coding a current block (420) in the current frame (410) as shown in FIG. 4, where each row of reconstructed blocks has the same or less number of reconstructed blocks than the previous row of reconstructed blocks. One of the reasons for the restriction of the previous reconstructed region as shown in FIG. 4 is for wave-front parallel process (WPP), which allows multiple rows of blocks processed in parallel. In screen content coding (SCC), the processing of the current block may rely on data from the block above the current block and the processing of the current block has to wait till a whole or partial above-block is processed. Therefore, the processing of a current block in a current row has to be delayed with respect to a corresponding block in the above row. Usually, delay corresponding to one or more blocks is used. While the WPP coding is intended for parallel encoding or decoding of multiple rows of blocks, the WPP structure is also used for non-parallel processing. Accordingly, the WPP structure and the ladder-shaped previous reconstructed region have also been used for IBC coding. As shown in FIG. 4, the consecutive previous reconstructed blocks within the ladder-shaped region in each row are not a complete row of blocks. For convenience, consecutive blocks in a row are referred as a row of blocks, which may be a partial row or a full row.
For the IBC processing, the previous reconstructed region has to be stored for coding the current block. The amount of stored reconstructed region will grow along the progress of the current block. FIG. 5 illustrates three instances of IntraBC coding of the current block. For time instances t1, t2 and t3 (t1<t2<t3), the previous reconstructed regions (514, 524 and 534) for the current blocks (512, 522 and 532) are indicated respectively. As shown in FIG. 5, the previous reconstructed region complies with the WPP structure (i.e., having a ladder-shaped region) continues to grow along the progress of coding process. For the last current block in the picture, the previous reconstructed region corresponds to whole picture less than the current block.
In hardware based coding processor, the data corresponding to the previous reconstructed region may be stored in a buffer or embedded memory so that the processor can access the data stored in the buffer. However, such buffer or embedded memory for the previous reconstructed region would increase the cost. On the other hand, the data for the previous reconstructed region could be stored in an external memory. Nevertheless, there might be significant penalty on processing speed due to external memory access. Accordingly, it is desirable to develop a method and apparatus to overcome the memory issue associated with storing the previous reconstructed region.