Field
This disclosure is directed towards video compression technologies. In particular, the present disclosure is directed towards next-generation video coding technologies including video coding/decoding technologies beyond High Efficiency Video Coding (HEVC), such as Versatile Video Coding (VVC). More specifically, an aspect of the disclosure is directed towards a VVC primary transform method, device and computer medium that use a large, 8-bit transform core having a size that is equal to or greater than 64×64.
Description of Related Art
Video coding and decoding using inter-picture prediction with motion compensation has been known for decades. Uncompressed digital video can consist of a series of pictures, each picture having a spatial dimension of, for example, 1920×1080 luminance samples and associated chrominance samples. The series of pictures can have a fixed or variable picture rate (informally also known as frame rate), of, for example 60 pictures per second or 60 hertz (Hz). Uncompressed video has significant bitrate requirements. For example, 1080p60 4:2:0 video at 8 bit per sample (1920×1080 luminance sample resolution at 60 Hz frame rate) requires close to 1.5 Gbit/s bandwidth. An hour of such video requires more than 600 GByte of storage space.
One purpose of video coding and decoding can be the reduction of redundancy in the input video signal, through compression. Compression can help reduce aforementioned bandwidth or storage space requirements, in some cases by two orders of magnitude or more. Both lossless and lossy compression, as well as a combination thereof can be employed. Lossless compression refers to techniques where an exact copy of the original signal can be reconstructed from the compressed original signal. When using lossy compression, the reconstructed signal may not be identical to the original signal, but the distortion between original and reconstructed signal is small enough to make the reconstructed signal useful for the intended application. In the case of video, lossy compression is widely employed. The amount of distortion tolerated depends on the application; for example, users of certain consumer streaming applications may tolerate higher distortion than users of television contribution applications. The compression ratio achievable can reflect that: higher allowable/tolerable distortion can yield higher compression ratios.
A person of ordinary skill generally understands video compression/decompression technology. In general, to compress video or image data, a series of functional steps are performed to result in a compressed video or image file. Although an image, such as a 360° image (e.g., captured by a 360° camera) may be suitable for compression, for ease of reading, compression of a video file will be explained. To generate a compressed video file, under conventional standards (e.g., H.264, H.265) an uncompressed video sample stream that is received from a video source may be partitioned or parsed, which results in a block of samples of two or more reference pictures.
Bi-Prediction can relate to techniques where a prediction unit (PU), such as a predicted block of samples, can be predicted from two motion compensated blocks of samples of two or more reference pictures. Bi-prediction was first introduced into video coding standards in MPEG-1, and has also been included in other video coding technologies and standards such as MPEG-2 Part 2 (or H.262), H.264 and H.265 as well.
When decompressing a compressed video file, during the reconstruction of a sample of a bi-predicted PU, motion compensated and interpolated input samples from each reference block can be multiplied by a weighting factor that can be different for each reference block, and such weighted sample values of the two reference blocks can be added to generate the sample under reconstruction. Such sample can be processed further by mechanisms such as loop filtering.
In MPEG-1 and MPEG-2, the weighting factors can be determined based on the relative temporal distance between the picture to which the PU under reconstruction belongs to, and the two reference pictures. This is possible because, in MPEG-1 and MPEG-2, one of the two reference I or P pictures was in the “past”, and the other in the “future” (in terms of presentation order) of the B-picture under reconstruction, and because in MPEG-1 and MPEG-2, there was a well-defined timing relationship established for any picture under reconstruction in relation to its reference pictures.
Starting with H.264, the reference picture selection concepts for bi-predicted pictures were relaxed such that the reference pictures only needed to be earlier in decoding order, but not in presentation order. Further, the notion of time was also relaxed in that neither H.264 nor H.265 requires a constrained/fixed picture interval in the time domain. Therefore, a decoder cannot calculate weighting factors any more based on the timing information available in the bitstream. Instead, H.264 and H.265 include a “default” of 0.5 as the weighting factor for the reference samples of a bi-predicted picture. This default can be overwritten by syntax available in the slice header known as pred_weight_table( ). The default of 0.5 or the information in the pred_weight_table may apply to all bi-predicted PUs in a given slice.
Non-Patent Literature 1 shows the H.265/HEVC standard. However, a need for standardization of future video coding technology with a compression capability that significantly exceeds that of the HEVC standard (including its extensions) has been studied by the inventors.
Non-Patent Literature 2 discloses a recently launched standardization format for next-generation video coding beyond HEVC called Versatile Video Coding (VVC) with a version VTM (VVC Test Model). VVC may generally provide a large (e.g., 64-point or higher) transform core using a 10-bit integer matrix.