A combination of a highly non-linear transfer function, 4:2:0 or 4:2:2 subsampling and non-constant luminance ordering gives rise to severe artifacts in saturated colors. An example is described in [1], where changes between two colors of similar luminance can result in a reconstructed image with very different luminances. In this document, the prior art way of processing the video is denoted the “anchor” way, since it was used to create the anchors in the MPEG call for evidence, described in [2].
The problem with using the anchor processing chain is that apart from getting inaccurate luminance, one may also get inaccurate chrominance. This is due to the fact that the chroma samples Cb′ and Cr′ are subsampled in the Y′Cb′Cr′ space. There is a problem with this, namely that the non-linearity of the Y′Cb′Cr′ color space will favor dark colors. This is not a desired outcome and implies that the chroma samples Cb′ and Cr′ will be inaccurate to start with.
The first part of subsampling is filtering, and since filtering is a kind of averaging, it is sufficient to see what happens when we average two colors. It is easier to see what happens when we average in the R′G′B′ color space or domain than in Y′Cb′Cr′, so first we will prove that averaging in these two domains amounts to the same thing. To do this, first note that Y′Cb′Cr′ is just a linear combination of R′G′B′, as illustrated for example with the BT.709 R′G′B′ to Y′Cb′Cr′ conversion matrix:Y′=0.212600*R′+0.715200*G′+0.072200*B′Cb′=−0.114572*R′−0.385428*G′+0.500000*B′Cr′=0.500000*R′−0.454153*G′−0.045847*B′  (equation 1)
Thus, if the vector q holds the color in R′G′B′; q=(q1, q2, q3)=(R′, G′, B′) and the vector p holds the same color in (Y′, Cb′, Cr′); p=(p1, p2, p3)=(Y′, Cb′, Cr′), we have p=M q, where M is the matrix above. Likewise q=M−1 p. Assume we have two vectors in Y′Cb′Cr, p1 and p2 that we want to average. We will now show that first going to R′G′B′, then performing the averaging, and then going back is the same as just averaging p1 and p2 directly. We go to R′G′B′ by using q1=M−1 p1, and q2=M−1 p2.
The average in the R′G′B′ space is qa=½(q1+q2), but this is equal to qa=½(q1+q2)=½(M−1 p1+M−1 p2)=M−1½(p1+p2).
Going back to Y′Cb′Cr′ is done by multiplying with M, pa=M qa=M M−1 ½ (p1+p2)=½(p1+p2), but this is the same thing as you would get if you averaged in Y′Cb′Cr′ directly. We now only have to show that subsampling in R′G′B′ favors dark colors.
Consider the two RGB colors (1000, 200, 0) and (10, 200, 0). The first color is very red, and the second color is very green. However, the first color is so much brighter than the second. If seen at a distance so that they blur into one, the net effect would be a reddish pixel since ½[(1000, 200, 0)+(10, 200, 0)]=(505, 200, 0), which is more red than it is green. However, in R′G′B′, the two colors get the values (0.7518, 0.5791, 0) and (0.2997, 0.5791, 0). Their average will be ½[(0.7518, 0.5791, 0)+(0.2997, 0.5791, 0)]=(0.5258, 0.5791, 0), which when converted back to RGB is (119, 200, 0). Thus, the resulting color when averaged in the R′G′B′ domain is almost twice as green as red. Thus, the dark color (10, 200, 0), which is green, has had an unduly big influence on the average.
To see how this can look in practice, consider a small image that is just 2×2 pixels, containing the following linear RGB colors:
(3.41, 0.49, 0.12)(0.05, 0.08, 0.02)(0.05, 0.08, 0.02)(0.05, 0.08, 0.02)
Since this is a High Dynamic Range (HDR) image, it is hard to show it in a low-dynamic range medium such as this document. However it is possible to do several Low Dynamic Range (LDR) or Standard Dynamic Range (SDR) exposures by applying the functionLDR_red=clamp(0,255*(HDR_red*2c)gam,255),where c goes from −3 to 1, gam=0.45 and clamp(a, t, b) makes sure the value t is between [a, b].
This can be called LDR-“exposures” of the HDR image.
The HDR pixel is quite dark—the highest coefficient is 3.41 out of 4000, so the darkest exposure is the most relevant here. The top left pixel is reddish and the surrounding pixels look black. Only in the brighter exposures is it possible to see that the dark pixels are actually a bit greenish.
However, when following the anchor processing chain to convert from RGB to Y′Cb′Cr′ 4:2:0 and back again, the resulting HDR image will be
(1.14, 0.79, 0.38)(0.12, 0.06, 0.01)(0.12, 0.06, 0.01)(0.12, 0.06, 0.01)
The problem here is that the redness of the top left pixel has disappeared and has been replaced with a gray/white pixel. The reason is that averaging in the non-linear Y′Cb′Cr′ domain favors dark colors, which will make the resulting pixel unduly green. Furthermore, since there are three green pixels and just one red pixel, the result will be yet greener. The result is not very similar to the original.
There is, thus, a need for improvements with regard to inaccuracies in chrominance when using the prior art processing chain that is based on a combination of a highly non-linear transfer function, chroma subsampling and non-constant luminance ordering.