With the development and distribution of hardware capable of reproducing and storing high-resolution or high-quality video content, the need for a video codec capable of effectively encoding or decoding high-resolution or high-quality video content has increased. A standard video codec typically includes an encoder and a decoder. The encoder at a transmitter compresses the video to generate a compressed bit stream. The compressed bit stream also includes control information. The control information comprises information needed to decode the compressed bit stream. The decoder at a receiver receives the compressed bit stream and decodes the compressed bit stream in accordance with the control information, and reconstructs the video from the decoded bit stream.
Generally, each frame in the video is divided into blocks of the same size and each frame is then encoded/decoded block by block. The encoder implements video coding standards, which define a set of tools for encoding the frames. In traditional video coding standards such as MPEG-1, Mpeg-2, and H.264/AVC, the frame is divided into equal sized blocks in order of 16×16 pixels and such blocks are generally referred to as macroblocks. In recent video coding standard like HEVC, the frame is divided into larger sized blocks of up to 64×64 pixels, and they are referred as Coding Units instead of Macroblocks. In the traditional video coding standards, the encoder encodes the macroblocks by scanning the macroblocks from left to right and then from top to bottom. Such a scan order is known as raster scan pattern. Further, motion estimation process is employed for eliminating inter-frame redundancy between successive frames which is achieved by comparing frame data (pixels) of two frames (current and reference frames). However, this process is highly data intensive as it works on pixel data from two frames. Since pixel data for a frame has huge memory footprint, the pixel data is generally stored in external memory. As per requirements, the pixel data is loaded into the cache for processing resulting in high memory bandwidth usage.
Further, for every macroblock in a current frame, a search region surrounding collocated macroblock in a reference frame is searched for the best match. For comparison, this search window is loaded for each macroblock. The search window is pixel data and for a complete frame, this has a huge memory footprint. Since search regions of neighboring macroblocks overlap, therefore only the non-overlapping pixel data corresponding to the search window for the current macroblock is loaded into the cache for processing, thereby resulting in reduced memory bandwidth usage. Accordingly, FIG. 1A illustrates raster scan pattern used in a frame 100 currently being scanned. As described earlier, the frame 100 is divided into a plurality of macroblocks 101 having a size of 64×64 that are scanned using a raster scan pattern 102. For coding each macroblock 101 in current frame 100, a search region 103 (represented by dashed rectangle) having a size of 96×96 surrounding collocated macroblock in the current frame 100 is searched for best match. The spatially adjacent macroblocks have their respective search window overlapped creating an overlap region 104 (represented by solid grey rectangle) having a size of 32×32. Referring to FIG. 1A, macroblocks N, N+1, N+2, N+3, and N+4 are spatially adjacent to each other and therefore their respective search windows overlap. Such an overlap of search region provides memory bandwidth savings between successive macroblocks.
However, with increasing demand for larger frame sizes like high definition (HD) (1920×1080 pixels), 4 k (3840×2160 pixels) and higher frame rates like 60 and 120, the number of macroblocks being processed per second is increasing, thereby increasing the memory bandwidth requirements. The increased memory bandwidth requirement may be met by increasing re-usage of already cached data. One technique to increase the re-use is by increasing size of the cache. However, increasing cache size will increase costs of the entire system.
Another technique to increase the re-use is by using scanning patterns, which enable larger overlap of search regions cached from reference frames. However, the raster scan pattern in traditional video coding standards does not allow overlap of search regions for more than two consecutive macroblocks and therefore results in no search window overlap between successive macroblocks while switching rows at the right frame boundary. Referring to FIG. 1B, macroblock n is on the extreme right edge of the frame 100 and macroblock n+1 is on the extreme left edge of the frame 100. As may be gathered, there is no overlap between their respective search regions 104 while switching rows during the raster scanning.
The above-mentioned deficiencies of the raster scan are overcome in the latest encoding standards. In the latest coding standards, such as HEVC, the frame is divided into equal sized square units in maximum order of 64×64 pixels and such units are generally referred to as Coding Tree Units (CTUs) which are further divided into smaller units called Coding Units (CUs). The Coding Tree Units may also be referred as Largest Coding Units (LCUs). The large size of the CTUs increases coding efficiency while being scanned using the raster scan pattern. To enable large CTUs while keeping the coding performance of small detailed areas in the same image, hierarchical coding is used. Thus, a large size CTU may be further split into smaller size CUs based on a quad-tree structure. Thus, the CTU having a size of 64×64 may be split into CU of smaller sizes 64×64, 32×32, 16×16, and 8×8.
To increase the coding efficiency while having large size CU and to increase the re-use of already cached data, the CUs are scanned in a zigzag pattern or Z-scan pattern horizontally, row by row. Various solutions are available that implement the above-mentioned Z-scan pattern. In one solution, Z-scan blocks including at least four CUs are determined. The Z-scan block size is equal to or smaller than the maximum size of the all the CUs and is larger than the current CU being scanned. As such, the locality of the coded data is increased as the CUs that are scanned subsequently are located adjacent to each other, implying that the prediction is improved. Hence, the coding efficiency is increased. FIG. 2 illustrates Z-scan pattern and Z-scan blocks in a frame 200 currently being scanned in accordance with one embodiment of the solution. As described earlier, the frame 200 is divided into a plurality of LCU 201 having a size of 64×64 that are scanned using a raster scan pattern 102. The LCU 201 is further split into CU 202 having a size of 32×32. Z-scan blocks 203 (represented by bold square) of max size 64×64 equal to size of the LCU 201 max size are determined. Each Z-scan block 203 includes four CUs 202. The Z-scan blocks 203 are scanned horizontally, row by row, by scanning the CU 202 within the Z-scan block 203 in a Z-scan pattern 204.
In another solution, an image is split into LCUs having maximum size as allowed by a video codec and each LCU is further split in a hierarchical structure into CUs having size less than or equal to the size of the LCU. The size of the resultant CUs is determined as having a smallest encoded error by the video codec. Each of the LCU is then scanned using a first scan pattern such as raster scan pattern and each CU within the LCU is then scanned using a second scan pattern such as Zigzag scan pattern. This improves coding efficiency and re-use of cache. Accordingly, FIG. 3A illustrates a zigzag scan pattern of CUs according to size of the CUs in a frame 300 currently being scanned, in accordance with first embodiment the solution. As described earlier, the frame 300 is divided into a plurality of LCUs having maximum size of 64×64, as allowed by the video codec. The frame is split into LCUs having a size of 32×32. Since the LCU has a size of 32×32 that is ½ of the maximum size allowed by video codec, a group of adjacent LCUs, that are horizontally and vertically adjacent such as LCU0, LCU1, LCU2, and LCU3, is assumed such that the combined size of these LCUs is equal to the maximum size allowed. A plurality of such groups of the LCUs is then scanned according to a raster scanning pattern. Within the group, the LCUs are scanned in accordance with a zigzag pattern by scanning four LCUs at a time. Thus, the group of the maximum LCUs comprising LCU0, LCU1, LCU2, and LCU3 and the group of the maximum LCUs comprising LCU4, LCU5, LCU6, and LCU7 are scanned in accordance with raster scan pattern. On the contrary, LCU0, LCU1, LCU2, and LCU3, is scanned according to a zigzag scan pattern and the LCU4, LCU5, LCU6, and LCU7 is scanned according to a zigzag scan pattern.
Similarly, FIG. 3B illustrates a zigzag scan pattern of LCUs according to size of the LCUs in a frame 301 currently being scanned, in accordance with second embodiment the solution.
Further, FIG. 3C illustrates a zigzag scan pattern of LCUs according to size of the LCUs in a frame 302 currently being scanned, in accordance with third embodiment the solution. As described earlier, the frame 302 is divided into a plurality of LCUs having a size of 16×16. Since the LCU has a size of 16×16 that is ¼ of the maximum size allowed by the video codec (64×64), a group of adjacent LCUs, that are horizontally and vertically adjacent such as LCU0 to LCU15, is assumed such that the combined size of these LCUs is equal to the maximum size of LCU allowed. A plurality of such groups of the LCUs is then scanned according to a raster scanning pattern. Within the group, the LCU are scanned in accordance with a zigzag pattern by scanning four LCUs at a time. Thus, the group of the maximum LCUs comprising LCU0 to LCU 15 and the group of the maximum LCUs comprising LCU16 to LCU 31 are scanned in accordance with raster scan pattern. On the contrary, LCU0 to LCU 15 and the LCU16 to LCU 31, are scanned according to a zigzag scan pattern by scanning each four LCUs at a time in a zigzag pattern. In other words, LCU0 to LCU 3 is first scanned in a zigzag pattern, then LCU4 to LCU 7 is scanned in a zigzag pattern, then LCU8 to LCU 11 is scanned in a zigzag pattern, and finally LCU8 to LCU 15 is scanned in a zigzag pattern.
Thus, Z-scan pattern or zigzag pattern enables larger overlap of cached search region in comparison to raster scan pattern as the Z-scan pattern enables overlap with multiple CUs surrounding a current CU. However, Z-scan pattern can lead to lower overlap in case the cache is capable of storing single CU's search window only. FIGS. 4A and 4B illustrate a zigzag scan pattern of CUs in a frame 400. As described earlier, the frame 400 is divided into a plurality of CUs 401 that are scanned using a Z-scan pattern 402. For coding each CU 401, a search region 403 (represented by dashed rectangle) surrounding collocated CUs 401 in the current frame 400 is searched for best match. As may be observed from FIGS. 4A and 4B, overlap between the search regions 403 between CUs N and N+1 is similar to raster scan whereas the overlap between the search regions 403 between CUs N+1 and N+2 is lesser. Similar loss in overlap region is present while moving from CU N+3 to next CU.
Further, the Z-scan pattern results in no search window overlap between successive Z-blocks while switching rows at right frame boundary similar to raster scan pattern, as may be observed from FIGS. 2, 3A, and 3C. Furthermore, one Z-block contains only four CUs, which is fixed, thereby limiting the overlapping.
In one another solution, massive digital holograms indicated as a fringe pattern are efficiently encoded by using an existing video/image compression tool when a 3D image is made by using a CGH (Computer Generated Hologram). The digital hologram encoding process comprises the pre-processing, the division (Segmentation), the frequency conversion (Transform), the post-process (Post-Processing), and the compression stages. As illustrated in FIG. 5A, the fringe is divided into blocks holding the full information about the object video by using 2D DCT transformation. Each of these blocks is treated as separate video frames and prediction blocks are obtained from neighboring blocks. This achieves the compression of the holograms. The video frames are then scanned using MPEG and JPEG standards such as raster scanning pattern, scanning from top-most to the bottom-most blocks, and vice versa. Accordingly, FIG. 5B illustrates a scanning pattern from top-most to the bottom-most blocks. The scanned video frames are then put in order to form a video stream. As may be observed from the figure, such scanning results in lesser search window overlap between successive blocks in a manner similar to raster scanning.
Thus, there is a need for a better solution to overcome above-mentioned deficiencies.