High Efficiency Video Coding (HEVC) is a new coding standard that has been developed in recent years. In the High Efficiency Video Coding (HEVC) system, the fixed-size macroblock of H.264/AVC is replaced by a flexible block, named coding unit (CU). Pixels in the CU share the same coding parameters to improve coding efficiency. A CU may begin with a largest CU (LCU), which is also referred as coded tree unit (CTU) in HEVC. In addition to the concept of coding unit, the concept of prediction unit (PU) is also introduced in HEVC. Once the splitting of CU hierarchical tree is done, each leaf CU is further split into one or more prediction units (PUs) according to prediction type and PU partition.
Along with the High Efficiency Video Coding (HEVC) standard development, the development of extensions of HEVC has also started. The HEVC extensions include range extensions (RExt) which target at non-4:2:0 color formats, such as 4:2:2 and 4:4:4, and higher bit-depths video such as 12, 14 and 16 bits per sample. One of the likely applications utilizing RExt is screen sharing, over wired- or wireless-connection. Due to specific characteristics of screen contents, coding tools have been developed and demonstrate significant gains in coding efficiency. Among them, the palette coding (a.k.a. major color based coding) techniques represent block of pixels using indices to the palette (major colors), and encode the palette and the indices by exploiting spatial redundancy. While the total number of possible color combinations is huge, the number of colors in an area of picture is usually very limited for typical screen contents. Therefore, the palette coding becomes very effective for screen content materials.
Dictionary coding has been found to be very effective for screen contents. A dual coder that selectively use a dictionary coder or a traditional hybrid coder has been disclosed in JCTVC-K0133 by Lin et al., (AHG7: Full-chroma (YUV444) dictionary+hybrid dual-coder extension of HEVC, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 11th Meeting: Shanghai, CN, 10-19 Oct. 2012, Document: JCTVC-K0133). The above reference is referred as JCTVC-K0133 in this disclosure. For a dictionary coder, the two-dimensional (2-D) image data is converted into a one-dimensional (1-D) packed pixel data. The processing is applied to each largest coding unit (LCU), which is also termed as a coding tree unit (CTU). The scanning pattern used by JCTVC-K0133 is shown in FIG. 1, where a vertical scan is applied to pixels within each LCU. The scanning starts from the upper-left corner of the LCU and goes downward to the bottom of the vertical line. After the first line is converted, the scanning process moves to the top of the second line and the process continues through the last line of the LCU. The scanning is applied across LCUs horizontally from a left most LCU to the last LCU in a LCU row.
An example of dictionary coding process is illustrated in FIG. 2, where the LCU consists of 16×16 pixels in a packed pixel format according to the scanning pattern shown in FIG. 1. In the packed pixel format, the color components at each pixel are treated individual samples. For example, in the case of YUV444 color format, the packed pixel format corresponds to color samples (Y0, U0, V0, Y1, U1, V1, . . . , where Yn, Un, Vn are the three color components of a pixel). Similarly, for RGB444 color format, the packed pixel format corresponds to color samples (R0, G0, B0, R1, G1, B1, . . . , where Rn, Gn, Bn are the three color components of a pixel).
FIG. 2 illustrates an example of encoding LCU m using dictionary coding, where the underlying video data uses a 444 color format (e.g., YUV444 or RGB444). The searching window (i.e., previously coded pixels) is treated as a long 1-D string generated according to the exemplary scanning pattern in FIG. 1. For the current LCU, vertical scanning is employed to convert the color pixels in the LCU into a 1-D sample string. The packed pixels corresponding to previously coded blocks (i.e., LCU in this example) are referred as reference packed pixels in this disclosure. Starting from the first sample (i.e., the upper-left corner sample) of LCU m, the encoder searches the optimal (e.g., longest) string in the searching window that matches the sample string in LCU m. The string found in the searching window is called matching string and the sample string in LCU m is called matched string. FIG. 2 shows the first three matching (and matched) strings for LCU m:                1) The 1st matching (and matched) string shown in fill pattern 211 has 42 samples (i.e., 14 pixels) and the matching string is located in LCU 0 (i.e., the last 15 samples of the last column) and LCU 1 (i.e., the first 27 samples of the first column).        2) The 2nd matching (and matched) string shown in fill pattern 212 has 33 samples (i.e., 11 pixels) and the matching string is located inside LCU 0 (i.e., the last 21 samples of the third column followed by the first 12 samples of the fourth column);        3) The 3rd matching (and matched) string shown in fill pattern 213 has 45 samples (15 pixels) and the matching string is located in LCU h−1 (i.e., the last 39 samples of the last column) and LCU h (i.e., the first 6 samples of the first column).        
Dictionary based dual-coder bitstream according to JCTVC-K0133 is a mix of dictionary coder bitstream segment and traditional hybrid coder bitstream segment. A corresponding decoder will parse the coder type in the CTU layer to determine whether a current CTU is coded by the dictionary coder or the hybrid coder. A syntax element ctu_coded_by_dictionary_coder_flag is used to indicate whether the CTU is coded by dictionary coder or not. The exemplary CTU layer syntax according to JCTVC-K0133 is shown in Table 1.
TABLE 1De-scrip-torcoding_tree_unit( x0, y0, log2CbSize ) {xCtb = ( CtbAddrRS % PicWidthInCtbsY ) << Log2CtbSizeYyCtb = ( CtbAddrRS / PicHeightInCtbsY ) << Log2CtbSizeYCtbAddrInSliceSeg = CtbAddrInRS −slice_segment_addressif( slice_sao_luma_flag | | slice_sao_chroma_flag )sao( xCtb >> Log2CtbSizeY, yCtb >> Log2CtbSizeY )if(dual_coder_mode = = 0) coding_quadtree( xCtb, yCtb, Log2CtbSizeY, 0 ) else {ctu_coded_by_dictionary_coder_flagae(v)if(cu_coded_by_dictionary_coder_flag = = 0)//HYBRID_CODER coding_quadtree( xCtb, yCtb, Log2CtbSizeY, 0 )else dictionary_coder( x0, y0, Log2CtbSizeY ) }}
In Table 1, ctu_coded_by_dictionary_coder_flag equal to 0 specifies that the CTU is coded by the hybrid coder. ctu_coded_by_dictionary_coder_flag equal to 1 specifies that the CTU is coded by the dictionary coder.
The dictionary_coder( ) syntax according to JCTVC-K0133 is shown in Table 2. NumSamplesInCTU is the total number of Y samples, U samples, and V samples in a coding unit. For example, the total number of samples is 16×16×3=768 for a coding tree unit of size 16×16 pixels in a 444 color format, and 64×64×3=12,288 for a coding tree unit of size 64×64 pixels in 444 color format).
TABLE 2De-scrip-tordictionary_coder( x0, y0, Log2CtbSizeY ) {decoded_sample_count = 0NumSamplesInCTU = (3 << (Log2CtbSizeY << 1))while(decoded_sample_count < NumSamplesInCTU) {matching_string_flagae(v)if(matching_string_flag = = 1) { matching_string_distance_use_recent_8_flagae(v) if(matching_string_distance_use_recent_8_flag = = 1) {matching_string_distance_recent_8_idxae(v) matching_string_distance_minus1 = for( i = matching_string_distance_recent_8_idx; i > 0; i−− )Recent8Distances[i] = Recent8Distances[i−1]Recent8Distances[0] =matching_string_distance_minus1} else {matching_string_distance_minus1ae(v) for( i = 7; i > 0; i−− )Recent8Distances[i] = Recent8Distances[i−1]Recent8Distances[0] =matching_string_distance_minus1} matching_string_length_minus2ae(v) decoded_sample_count += (matching_string_length_minus2 + 2)}else { unmatchable_sample_residual_absae(v) unmatchable_sample_residual_signae(v) decoded_sample_count += 1}}}
In Table 2, matching_string_flag specifies whether the succeeding dictionary coding element is a matching string or an unmatchable sample residual, where the sample may correspond to a luma or chroma sample. If matching_string_flag=1, the succeeding dictionary coding element is a matching string, otherwise, the succeeding dictionary coding element is an unmatchable sample residual.
matching_string_distance_use_recent_8_flag equal to 1 specifies that the current matching_string_distance_minus1 is exactly the same as one of the eight most recently decoded matching_string_distance_minus1 stored in an eight-entry array named Recent8Distances[8] and the succeeding syntax element is an index of the array.
matching_string_distance_use_recent_8_flag equal to 0 specifies that the succeeding syntax element is matching_string_distance_minus1.
matching_string_distance_recent_8_idx is an index to Recent8Distances[8].
matching_string_distance_minus1+1 specifies the distances between the first sample of current sample-string to be decoded/constructed and the first sample of matching sample-string previously decoded/constructed in the dictionary, which is actually DPB (decoded picture buffer) and the current decoded picture reordered in a “dictionary-scan” format defined as a column-wise-packed-pixel cu-by-cu format. The value of matching_string_distance_minus1 shall be in the range of 0 to 4,194,303, inclusive
matching_string_length_minus2+2 specifies the length of the matching sample-string. The value of matching_string_length_minus2 shall be in the range of 0 to 271, inclusive.
unmatchable_sample_residual_abs is the absolute difference between the unmatchable sample (e.g., Y, U or V component of a YUV pixel) to be decoded and its predictor that is the same component (y or u or v) of the previous pixel in the dictionary. The value of unmatchable_sample_residual_abs shall be in the range of 0 to (256<<bit_depth_luma_minus8)−1 for luma component or (256<<bit_depth_chroma_minus8)−1 for chroma component, inclusive.
unmatchable_sample_residual_sign specifies the sign of an unmatchable sample residual. If unmatchable_sample_residual_sign is equal to 0, the corresponding unmatchable sample residual has a positive value. Otherwise (unmatchable_sample_residual_sign is equal to 1), the corresponding unmatchable sample residual has a negative value.
Exemplary corresponding decoding pseudo codes for the dictionary based coding are described as follows. 3 bits are used for matching_string_length_minus2 in the ranges of [0, 7] and [8, 16], and 8 bits for else are used for the remaining lengths.
Decode Smaller_than_8_flagif ( Smaller_than_8_flag== 1 )decode 3 bits matching_string_length_minus2else{Decode Smaller_than_16_flagIf ( Smaller_than_16_flag==1 ){decode 3 bits matching_string_length_minus2 matching_string_length_minus2= matching_string_length_minus2+8;}else{decode 8 bits matching_string_length_minus2; matching_string_length_minus2= matching_string_length_minus2+16 }}length = matching_string_length_minus2+ MATCH_LEN_MIN;uDecByteCnt += length.
The sample based string formation and the LCU based scanning may not be coding efficient. It is desirable to develop new dictionary coding to improve the coding performance of existing dictionary coding.