Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Recent developments in video coding standardisation have led to the formation of a group called the “Joint Collaborative Team on Video Coding” (JCT-VC). The Joint Collaborative Team on Video Coding (JCT-VC) includes members of Study Group 16, Question 6 (SG16/Q6) of the Telecommunication Standardisation Sector (ITU-T) of the International Telecommunication Union (ITU), known as the Video Coding Experts Group (VCEG), and members of the International Organisations for Standardisation/International Electrotechnical Commission Joint Technical Committee 1/Subcommittee 29/Working Group 11 (ISO/IEC JTC1/SC29/WG11), also known as the Moving Picture Experts Group (MPEG).
The Joint Collaborative Team on Video Coding (JCT-VC) has produced a new video coding standard that significantly outperforms the “H.264/MPEG-4 AVC” (ISO/IEC 14496-10) video coding standard. The new video coding standard has been named “high efficiency video coding (HEVC)”.
Other developments, e.g. in the Video Electronics Standards Association (VESA), have been directed towards video coding algorithms capable of latencies under one frame. Traditional video compression standards, such as H.264/AVC and HEVC, have latencies of multiple frames (or at least one frame), as measured from an input of the encoding process to the output of the decoding process. Codecs complying with such standards may be termed ‘distribution codecs’, as codecs are intended to provide compression for distribution of video data from a source, such as a studio, to the end consumer (e.g. via terrestrial broadcast or internet streaming). HEVC does have signalling support for latencies under one frame, in the form of a Decoding Unit Supplementary Enhancement Information (SEI) message. The Decoding Unit SEI message is an extension of the timing signalling present in the Picture Timing SEI message, allowing specification of the timing of units less than one frame. However, the signalling is insufficient to achieve very low latencies with minimal buffering, and consequently results in tight coupling of the encoding and decoding processes. Applications requiring low latency are generally present within a broadcast studio.
In a broadcast studio, video may be captured by a camera before undergoing several transformations, including real-time editing, graphic and overlay insertion and muxing. Once the video has been adequately processed, a distribution encoder is used to encode the video data for final distribution to end consumers. Within the studio, the video data is generally transported in an uncompressed format. This necessitates the use of very high speed links. Variants of the Serial Digital Interface (SDI) protocol can transport different video formats. For example, 3G-SDI (operating with a 3 Gbps electrical link) can transport 1080p HDTV (1920×1080 resolution) at thirty (30)fps and eight (8) bits per sample. Interfaces having a fixed bit rate are suited to transporting data having a constant bit rate (CBR). Uncompressed video data is generally CBR, and compressed video data may also be CBR. As bit rates increase, achievable cabling lengths reduce, which becomes problematic for cable routing through a studio. For example, UHDTV (3840×2160) requires a four times (4×) increase in bandwidth compared to 1080p HDTV, implying a 12 Gbps interface. Increasing the data rate of a single electrical channel reduces the achievable length of the cabling. At 3 Gbps, cable runs generally cannot exceed one hundred and fifty (150) metres, the minimum usable length for studio applications.
One method of achieving higher rate links is by replicating cabling (e.g. by using four 3G-SDI links), with frame tiling or some other multiplexing scheme. However, the cabling replicating method increases cable routing complexity, requires more physical space, and may reduce reliability compared to use of a single cable. Thus, a codec that can perform compression at relatively low compression ratios (e.g. 4:1) while retaining a ‘visually lossless’ (i.e. having no perceivable artefacts compared to the original video data) level of performance is required by industry. Compression ratios may also be expressed as the number of ‘bits per pixel’ (bpp) afforded to the compressed stream, noting that conversion back to a compression ratio requires knowledge of the bit depth of the uncompressed signal, and the chroma format. For example, eight (8) bit 4:4:4 video data occupies 24 bpp uncompressed, so 4 bpp implies a 6:1 compression ratio.
The Display Stream Compression Sub-group within VESA has produced a standard named Display Stream Compression (DSC) and is standardising a newer variant named Advanced Display Stream Compression (ADSC). However, this activity is directed more towards distribution of high-resolution video data between components within electronic devices, such as smart phones and tablets, as a means of reducing the printed circuit board (PCB) routing difficulties for supporting very high resolutions (e.g. as used in ‘retina’ displays), by reducing either the clock rate or the number of required PCB traces. As such, ADSC is targeting applications where a single encode-decode cycle (‘single-generation’ operation) is anticipated. Within a broadcast studio, video data is typically passed between several processing stages prior to final encoding for distribution. For passing UHD video data through bandwidth-limited interfaces, such as 3G-SDI, multiple encode-decode cycles (‘multi-generational’ operation) is anticipated. Then, the quality level of the video data must remain visually lossless after as many as seven encode-decode cycles.
Video data includes one or more colour channels. Generally there is one primary colour channel and two secondary colour channels. The primary colour channel is generally referred to as the ‘luma’ channel and the secondary colour channel(s) are generally referred to as the ‘chroma’ channels. Video data is represented using a colour space, such as ‘YCbCr’ or ‘RGB’. Some applications require visually lossless compression of the output of a computer graphics card, or transmission from a SOC in a tablet to the LCD panel in the tablet. Such content often has different statistical properties from content captured from a camera, due to the use of rendering widgets, text, icons etc. Such applicationts can be referred to as ‘screen content applications’. For screen content applications, ‘RGB’ is commonly used, as this is the format generally used to drive LCD panels. Note that the greatest signal strength is present in the ‘G’ (green) channel, so generally the G channel is coded using the primary colour channel, and the remaining channels (i.e. ‘B’ and ‘R’) are coded using the secondary colour channels. This arrangement may be referred to as ‘GBR’. When the ‘YCbCr’ colour space is in use, the ‘Y’ channel is coded using the primary colour channel and the ‘Cb’ and ‘Cr’ channels are coded using the secondary colour channels.
Video data is also represented using a particular chroma format. The primary colour channel and the secondary colour channels are spatially sampled at the same spatial density when the 4:4:4 chroma format is in use. For screen content, the commonly used chroma format is 4:4:4, as generally LCD panels provide pixels at a 4:4:4 chroma format. The bit-depth defines the bit width of samples in the respective colour channel, which implies a range of available sample values. Generally, all colour channels have the same bit-depth, although colour channels may alternatively have different bit-depths. Other chroma formats are also possible. For example, if the chroma channels are sampled at half the rate vertically (compared to the luma channel), a 4:2:2 chroma format is said to be in use. Also, if the chroma channels are sampled at half the rate horizontally and vertically (compared to the luma channel), a 4:2:0 chroma format is said to be in use. These chroma formats exploit a characteristic of the human visual system that sensitivity to intensity is higher than sensitivity to colour. As such, it is possible to reduce sampling of the colour channels without causing undue visual impact. However, reducing sampling of the colour channels without causing undue visual impactis less applicable to studio environments, where multiple generations of encoding and decoding are common. Also, for screen content the use of chroma formats other than 4:4:4 can be problematic as distortion is introduced to sub-pixel rendered (or ‘anti-aliased’) text and sharp object edges.
Frame data may also contain a mixture of screen content and camera captured content. For example, a computer screen may include various content such as windows, icons, control buttons, and text, and also contain a video being played, or an image being viewed. Such content, in terms of the entirety of a computer screen, can be referred to as ‘mixed content’. Moreover, the level of detail (or ‘texture’) varies within a frame. Generally, regions of detailed textures (e.g. foliage, text), or regions containing noise (e.g. from a camera sensor) are difficult to compress. The detailed textures can only be coded at a low compression ratio without losing detail. Conversely, regions with little detail (e.g. flat regions, sky, background from a computer application) can be coded with a high compression ratio, with little loss of detail.
In the context of sub-frame latency video compression, the buffering included in the video encoder and the video decoder is generally substantially smaller than one frame (e.g. only dozens of lines of samples). Then, the video encoder and video decoder must not only operate in real-time, but also with sufficiently tightly controlled timing that the available buffers do not underrun or overrun. In the context of real-time operation, it is not possible to stall input or delay output (e.g. due to buffer overrun or underrun). If input was stalled or output delayed, the result would be some highly noticeable distortion of the video data being passed through the video encoder and decoder. Thus, a need exists for algorithms to control the behaviour of the video encoder and decoder to avoid such situations.