Transmission of moving pictures in real-time is employed in several applications like e.g. video conferencing, net meetings, TV broadcasting and video telephony.
However, representing moving pictures requires much information as digital video typically is described by representing each pixel in a picture with 24 bits (3 Byte). Such uncompressed video data results in large bitrates, and cannot be transferred over conventional communication networks and transmission lines in real time due to limited bandwidth.
Thus, enabling real time video transmission requires a high level of data compression. Data compression may, however, compromise picture quality. Therefore, great efforts have been made to develop compression techniques allowing real time transmission of high quality video over bandwidth limited data connections.
Many video compression standards have been developed over the last 20 years. Many of those methods are standardized through ISO (the International Standards organization) or ITU (the International Telecommunications Union). Besides, a number of other proprietary methods have been developed. The main standardization methods are: ITU: the H.261 standard, the H.262 standard, the H.263 standard, the H.264/AVC standard (each of which is incorporated herein by reference in its entirety); and ISO: the MPEG1 standard, the MPEG2 standard, and the MPEG4/AVC standard (each of which is incorporated herein by reference in its entirety).
Video compression formats rely on statistical properties of the input data. Prior to the standardized video compression/decompression, the raw video data has to be converted to a format suitable for compression. This format is described in the standards, but the process of converting to/from it is to some degree left to the developer. The conversion process is a lossy process as it includes spatial decimation and interpolation. The electronic representation of an image can commonly be interpreted as luma-information (roughly corresponding to “black-and-white” content) and a number of chroma (color-difference) channels that form an image. In this case, the luma and chroma information is transformed from a discrete 2-dimensional matrix of pixels, each containing a red-green-blue sub-pixel triplet, which typically is the case for image sensors and displays. The color-space of the luma and chroma information is often denoted as YCbCr (luma, blue chroma difference, red chroma difference) and the spatial information of the chroma channel are reduced (decimated) vertically and/or horizontally with a factor of between 1:1 and 4:1. One important format is “YCbCr 4:2:0”, which is used in different forms in most of the MPEGx and H.26x video compression formats mentioned above. The principle of spatial decimation (reducing the number of pixels) and interpolation (increasing the number of pixels) is to remove information that cannot be transmitted reliably, and to represent the available information in a perceptually pleasing way, respectively. Since the decimation/interpolation of chroma channels is a lossy process, different methods create different artifacts that may or may not be objectionable for a given set of full-resolution input images and viewers.
An example of a spatial linear decimation according to a conventional technique is described below. For a filter kernel of length K=2 indexed by k:{hk}={0.5,0.5}Linear filtering of input signal x(n) consisting of pixel values at offsets n, using kernel h:y(n)=conv(x(n),h(n))=0.5*x(n)+0.5*x(n−1)Dropping samples and shifting to the desired phase:
      z    ⁡          (              n        -        0.5            )        =      {                                                      y              ⁡                              (                n                )                                      ,                                                              n              =              2                        ,            4            ,            6                                                            0            ,                                    else                    In practical systems, non-integer storage cells are uncommon, so z would be shifted to a practical phase, and zero-components discarded:
      g    ⁡          (      m      )        =      {                                                      y              ⁡                              (                                  2                  *                  m                                )                                      ,                                                1            <            m            <            3                                                            0            ,                                    else                    
The method outlined above is the basis for general image decimation and chroma-channel decimation. Interpolation can be described in a very similar manner. The objective is to “leak” values across pixel boundaries before or after changing the number of pixels so that the perceived image stays relatively constant even though the number of pixels used in describing it changes.
Video conference data containing a mixture of continuous tone content and palletized content are now quite common. Examples include screen capture images, web pages, educational and training videos (especially those containing screen capture images or web pages), and business presentations, among others. Web pages often include photographs interspersed among text and other palettized content. In addition to mixed content, it is also very common to transmit two parallel streams in a video conference, one including video content and one including data content, such as presentations or screen shots. However, all data is coded and decoded with a coding scheme as described above, which is most optimal for continuous tone pictures, i.e. video pictures captured by a video camera. This implies that the perceived quality for input-images containing sharp edges and abrupt transitions like in plain text or line art etc. are reduced, since the sharp edges to some degree are spread out spatially by the coding process. Why this occurs is explained in the following accompanied with the FIGS. 1-7.
FIG. 1 contains a simplistic 1×6 pixel image containing the RGB-values [0 0 0] and [255 0 0]. The same image information is also shown in FIG. 2 as a 1-dimensional bar graph, showing how the red intensity has an abrupt change for the right half of this image.
      Image    rgb    =      {                                        red            ⁢                          :                                                0                          0                          0                          255                          255                          255                                                  green            ⁢                          :                                                0                          0                          0                          0                          0                          0                                                  blue            ⁢                          :                                                0                          0                          0                          0                          0                          0                      }  
In FIG. 3, the same image information is transformed to the ITU-R BT.601 YCbCr color space where black bars represent Y (luma) information, while cyan and magenta bars represent Cb and Cr values, respectively. The spatial resolution is not changed.
      Image    YCbCr    =      {                                        red            ⁢                          :                                                16                          16                          16                          82                          82                          82                                                  green            ⁢                          :                                                128                          128                          128                          90                          90                          90                                                  blue            ⁢                          :                                                128                          128                          128                          240                          240                          240                      }  
In FIG. 4, however, the chroma channels have been decimated by a factor of 2, with representation levels at 1.5, 3.5 and 5.5. The method for decimation was a simple mean of the 2 closest source pixels. This is similar to a FIR filter using a 2-tap kernel of [0.5 0.5] followed by picking every second pixels.
            Image      YcbCr        ⁢    decimated    =      {                                        red            ⁢                          :                                                16                          16                          16                          82                          82                          82                                                  green            ⁢                          :                                                128                          110                          90                                                                                                                                                                                                  blue            ⁢                          :                                                128                          184                          240                                                                                                                                                                      }  
While the edge is still visible for the luma channel, it has been smoothed out for the chroma channels, since they have to represent both the black and the red levels that fall into the x=[2.5, 4.5] area.
In FIG. 5, new, higher pixel-count chroma vectors were found by simply repeating the values from FIG. 4. This is equivalent to zero-filling every second sample, then filtering with a FIR-kernel of [1 1].
            Image      YcbCr        ⁢    interpolated    =      {                                        red            ⁢                          :                                                16                          16                          16                          82                          82                          82                                                  green            ⁢                          :                                                128                          128                          110                          110                                      90            ⁢                                                                                                                            ⁢            90                                                            blue            ⁢                          :                                                128                          128                          184                                                                            ⁢            184                                                240            ⁢                                                                                                                            ⁢            240                                }  
Finally, FIG. 6 and FIG. 7 show that the reconstructed image has visible errors along the transition. Not only is the red color blurred, but we get discoloring (change in the relative rgb-mix). The exact value and size of such artifacts is a basic property of the decimation/interpolation technique employed.
            Image      YcbCr        ⁢    interpolated    =      {                                        red            ⁢                          :                                                0                          0                          89                          166                          255                          255                                                  green            ⁢                          :                                                0                          0                          0                          38                                      0            ⁢                                                                          0                                                  blue            ⁢                          :                                                0                          0                          0                          38                                      0            ⁢                                                                                                                            ⁢            0                                }  
The initial and final rgb matrixes use 6(pixels)×3(colors)=18 bytes to store or transmit this one-dimensional image. The 2× subsampled YCbCr alternative used 6 (lumapixels) and 2×3 (chromapixels) for a total of 12 bytes. If a more realistic example was used, the savings could be larger by decimating chroma in two dimensions. If considered as a bandwidth-problem, this image clearly could be transmitted as a full-resolution luma channel, and a table mapping the luma-values to full-color rgb triplets:
      Image    mapped    =      {                                        Y            :                          [                                                                    16                                                        16                                                        16                                                        82                                                        82                                                        82                                                              ]                                                                                      Map                              Y                →                rgb                                      :                          {                                                                                                                  Y                        16                                            =                                              [                                                                                                            0                                                                                      0                                                                                      0                                                                                                      ]                                                                                                                                                                                Y                        82                                            =                                              [                                                                                                            255                                                                                      0                                                                                      0                                                                                                      ]                                                                                                                                    
That could produce 6+2×3 bytes for transmission, just like the YCbCr-decimated in this example does, but with no quality loss. The problem basically is that the system for bandwidth reduction is optimized for slowly varying, full-color sources, such as photographic content. If the (full) spatial resolution of the luma-channel could be combined with the reduced spatial resolution of the chroma-channels to produce “local color maps”, images with sharp edges correlated in luma and chroma, but low usage of the entire color space would predictable look better. One way of solving this problem is described in U.S. Pat. No. 7,072,512 ('512), which is hereby incorporated by reference herein in its entirety. In this publication, an image segmentation algorithm identifying “continuous tone” image content e.g. photographic content, and “palletized” image content, e.g. text, is disclosed. Images to be transmitted are analyzed by the algorithm and based on the result of this analysis, coded by a codec specialized for the content type identified. By coding “palletized” image content with a coding scheme adjusted for this type of content, the problem of blurring sharp edges can be avoided.
However, since different coding schemes are used on the transmitting side, the receiving side is required to have the corresponding different decoding schemes installed and vice versa. Consequently, '512 is not able to solve the problem from in a media stream coded on a conventional, standardized way, but requires specialized codecs on both sides.