In the early days of color rendering, e.g. for television program display, the relationship between the content creation side (e.g. the camera operator), and the color rendering side (e.g. display on a television or computer display) was simple, and fixed by rigid technical principles. A so called standard CRT display was defined, which had particular phosphors, a certain gamma 2.2 tone reproduction curves (TRC), with 256 approximately visually equidistant driving steps etc. There are a number of fundamental color reproduction questions which were in this manner addressed, i.a. should a color rendering system be optimized to the (best) human viewer, and more importantly, should the color rendering capabilities (and in particular the color description/communication standard) be prescribed/determined (mostly) by the color capturing (camera) side or the color rendering (display) side.
A number of approximations were introduced at the time, as the ground rules for television colorimetry for the decades to come. Taking the physical display constraints of the era of the first color television into account, the first displays and displayed signals were optimized so that they would yield an ideal picture to the viewer, given the size, brightness etc. of the CRTs available at that time (NTSC, the late 1940s early 1950s: resolution fine enough for typical viewing distance, enough driving steps to just noticeable difference (JND) to perceptually reach good, indiscriminable black starting from the white luminances at the time, etc.).
Then, given that standard display of that time, which was a small, dark CRT, the rules for the content production side were laid down for converting captured scenes in reasonably looking pictures on the display, for most scenes (similar considerations took place in the world of analog photography, in which a scene had to be rendered in an often low quality photo print, which never had a contrast above 100:1, imperfect colors, etc.). E.g., even though theoretically one would need a spectral camera to measure a real life color scene (given its variable illumination), as an approximation, if one knows on which device the color is to be displayed on, camera sensitivity curves can be determined.
Images captured with such camera sensitivity curves are then supposed to reconstruct a similarly looking picture on the display, at least emulating at the same time the illumination of the scene at the capturing side, but in practice there will be errors. In addition, these camera sensitivity curves will have negative lobes. Although one could try to reproduce these theoretically optimal curves exactly with optical filter combinations, in practice (also given that the viewer does not know which colors exactly occur in the scene) matrixing will suffice to make the colors look reasonable.
Several content creation side professionals, like the camera operator and a color, grader/corrector, have to do their magic with parametric transformations to make the finally encoded images look optimal when displayed. For example, what is usually done by a color corrector (in the video world where different video feeds are combined) is that the color corrector looks at the white points of the different inputs (one global rather severe type of colorimetric image error), and matches the white points of the different inputs by increasing slightly, for example, the blue contributions of pixels, whilst also looking at critical colors like faces. In movie material, further artistic considerations may be involved, e.g., a slightly bluish look for night scenes may be casted, which, if not already largely created by a color filter matching the film characteristics, will typically be done in post production by a color grader. Another example, which may typically involve also tweaking the tone reproduction curves, is to make the movie look more desaturated, i.e., to give it a desolate look.
It is of even higher importance to take care of the tone reproduction curve gamma behavior. One might suspect that just applying a 0.45 anti-gamma correction to encode the captured linear sensor data will suffice, but apart from that, the larger dynamic range of a typical scene always has to be mapped somehow to the [0-255] interval. Tone reproduction curve tweaking will also result in, for example, a coarser, high contrast look, darker or more prominent shadows, etc. The camera operator typically has tunable anti-gamma curves available, in which the camera operator may set knee and shoulder points, etc., so that the captured scene has a good look (typically somebody looks at the captured images on a reference monitor, which used to be a CRT and may now be an LCD). In wet photography, the same can be realized with “hardware” processing, such as printing and developing conditions to map faces onto zone VI of the Adams zone system. However, nowadays there is often a digital intermediate which is worked on. Even cinematographers that love shooting on classical film stock, nowadays have available to them a digital video auxiliary stream (which can be very useful in the trend of increased technical filming, in which a lot of the action may, for example, be in front of a green screen). So in summary, apart from taking the actual room conditions at the viewer's side to be a given to be ignored, the whole color capturing system is designed around a “calibrated ideal display”, which is taken into account as a fixed given fact when the content creator creates his images.
The problem is that this was already very approximative in those days. The reasoning was like “if we do a bad job reproducing a scene on photographic paper anyway, we may relax all requirements regarding accuracy, and apply a more subjective definition of the technical mapping from scene to rendering, taking into account such principles as reasonable recognizability of the imaged scenes, consumer appreciated vivid color rendering, etc.” However, this technology of image encoding (e.g., as prescribed in PAL, or MPEG2) should be understood as co-existing with a number of critical questions, like: “what if one changes the illumination of the captured scene, be it the illuminance or the white point, or the spatial distribution, or the special characteristics”, “what about the errors introduced due to differences in illumination of the scene and the viewing environment, especially when seen in the light of a human viewer adapted to the scene vs. viewing environment”, etc.
These problems and resulting errors became aggravated when displays started changing from the standard CRT in a standard living room, to a range of very different displays and viewing environments (e.g., the peak white luminance of displays increased). Note that, as used herein, the phrase “peak white luminance of a display” and the expressions “display white luminance” and “display peak brightness (PB_D)” are interchangeable, with similar meaning.