We base our work on a simple model of image formation in which the response of an imaging device to an object depends on three factors: the light by which the object is lit, the surface reflectance properties of the object, and the properties of the device's sensors. We assume that a scene is illuminated by a single light characterised by its spectral power distribution which we denote E(λ) and which specifies how much energy the source emits at each wavelength (λ) of the electromagnetic spectrum. The reflectance properties of a surface are characterised by a function S(λ) which defines what proportion of light incident upon it the surface reflects on a per-wavelength basis. Finally a sensor is characterised by Rk(λ), its spectral sensitivity function which specifies its sensitivity to light energy at each wavelength of the spectrum. The subscript k denotes that this is the kth sensor. Its response is defined as:
                                          p            k                    =                                    ∫              ω                        ⁢                                          E                ⁡                                  (                  λ                  )                                            ⁢                              S                ⁡                                  (                  λ                  )                                            ⁢                                                R                  k                                ⁡                                  (                  λ                  )                                            ⁢                                                          ⁢                              ⅆ                λ                                                    ,                  k          =          1                ,        …        ⁢                                  ,        m                            (        1        )            where the integral is taken over the range of wavelengths ω: the range for which the sensor has non-zero sensitivity. In what follows we assume that our devices (as most devices do) have three sensors (m=3) so that the response of a device to a point in a scene is represented by a triplet of values: (p1, p2, p3) It is common to denote these triplets as R, G, and B or just RGBs and so we use the different notations interchangeably throughout. An image is thus a collection of RGBs representing the device's response to light from a range of positions in a scene.
Equation (1) makes clear the fact that a device response depends both on properties of the sensor (it depends on Rk(λ)) and also on the prevailing illumination on (E(λ)). That is, responses are both device and illumination dependent. It follows that if no account is taken of these dependencies, an RGB cannot correctly considered to be an intrinsic property of an object and is employing it as such is quite likely to result in poor results.
An examination of the literature reveals many attempts to deal with the illumination dependence problem. One approach is to apply a correction to the responses recorded by a device to account for the colour of the prevailing scene illumination. Provided an accurate estimate of the scene illumination can be obtained, such a correction accounts well for the illumination dependence, rendering responses colour constant: that is stable across a change in illumination. The difficulty with this approach is the fact that estimating the scene illuminant is non-trivial. In 1998 Funt et al [8] demonstrated that existing colour constancy algorithms are not sufficiently accurate to make such an approach viable. More recent work [11] has shown that for simple imaging conditions and given good device calibration the colour constancy approach can work.
A different approach is to derive from the image data some new representation of the image which is invariant to illumination. Such approaches are classified as colour (or illuminant) invariant approaches and a wide variety of invariant features have been proposed in the literature. Accounting for a change in features have been proposed in the literature. Accounting for a change in illumination colour is however difficult because, as is clear from Equation (1), the interaction between light, surface and sensor is complex. Researchers have attempted to reduce the complexity of the problem by adopting simple models of illumination change. One of the simplest models is the so called diagonal model in which it is proposed that sensor responses under a pair of illuminants are related by a diagonal matrix transform:
                              (                                                                      R                  c                                                                                                      G                  c                                                                                                      B                  c                                                              )                =                              (                                                            α                                                  0                                                  0                                                                              0                                                  β                                                  0                                                                              0                                                  0                                                  γ                                                      )                    ⁢                      (                                                                                R                    o                                                                                                                    G                    o                                                                                                                    B                    o                                                                        )                                              (        2        )            where the superscripts o and c characterise the pair of illuminants. The model is widely used, and has been shown to be well justified under many conditions [7]. Adopting such a model one simple illuminant invariant representation of an image can be derived by applying the following transform:
                                          R            ′                    =                      R                          R              ave                                      ,                              G            ′                    =                      G                          G              ave                                      ,                              B            ′                    =                      B                          B              ave                                                          (        3        )            where the triplet trial (Rave, Gave, Bave) denotes the mean of all RGBs in an image. It is easy to show that this so called Greyworld representation of an image is illumination invariant provided that Equation (2) holds.
Many other illuminant invariant representations have been derived, in some cases [10] by adopting different models of image formation. All derived invariants however share two common failings: first it has been demonstrated that when applied to the practical problem of image retrieval none of these invariants affords good enough performance across a change in illumination. Secondly, none of these approaches considers the issue of device invariance.
Device invariance occurs because different devices have different spectral sensitivity functions (different Rk in Equation (1)) but also because the colours recorded by a device are often not linearly related to scene radiance as Equation (1) suggests, but rather are some non-linear transform of this:
                                          p            k                    =                      f            ⁡                          (                                                ∫                  ω                                ⁢                                                      E                    ⁡                                          (                      λ                      )                                                        ⁢                                      S                    ⁡                                          (                      λ                      )                                                        ⁢                                                            R                      k                                        ⁡                                          (                      λ                      )                                                        ⁢                                                                          ⁢                                      ⅆ                    λ                                                              )                                      ,                  k          =          1                ,        …        ⁢                                  ,        m                            (        4        )            
The transform f( ) is deliberately applied to RGB values recorded by a device for a number of reasons. First, many captured images will eventually be displayed on a monitor. Importantly colours displayed on a screen are not a linear function of the RGBs sent to the monitor. Rather, there exists a power function relationship between the incoming voltage and the displayed intensity. This relationship is known as the gamma of the monitor, where gamma describes the exponent of the power function [15]. To compensate for this gamma function images are usually stored in a way that reverses the effect of this transformation: that is by applying a power function with exponent of 1/γ, where γ describes the gamma of the monitor, to the image RGBs. Importantly monitor gammas are not unique but can vary from system to system and so images from two different devices will not necessarily have the same gamma correction applied. In addition to gamma correction other more general non-linear “tone curve” corrections are often applied to images so as to change image contrast with the intention of creating a visually more pleasing image. Such transformations are device, and quite often, image dependent and so lead, inevitably to device dependent colour. In the next section we address the limitations of existing invariant approaches by introducing a new invariant representation.