Image processing is used today to detect various types of object within a large number of applications, such as in monitoring to detect whether there is an object, such as a person, inside a monitored area. A sensor records images of the monitored area.
There are strong requirements for reliable results from the image processing of these images, as incorrect evaluations can be costly. For example, the image processing can be used in a monitoring system to prevent break-ins. If an intruder is falsely detected, resulting in a false alarm, it can be very costly if, for example, the police or other security personnel are informed and come to the site as a result of the false alarm.
In order to detect whether there is a foreign object within a monitored area, a sensor records the incident intensity as grayscale values in a digital image of the monitored area. The recorded image is then compared with a reference image. The reference image can, for example, be the immediately preceding image or an image taken at a time when there was no foreign object within the area.
If there is a difference between the compared images, this can be due to a change in the scene or to a change in the lighting of the scene. The monitored area can be said to be a scene that consists of a number of surfaces with reflectance properties. A change in the scene results in a change of the set of surfaces of the recorded image, for example by an object coming into the monitored area or moving in the monitored area, between when the reference image was recorded and when the current image was recorded. A change in the lighting means that the incident light in the scene is changed while the set of surfaces is unchanged, for example by the sun going behind a cloud or by a lamp being switched on.
It is normally only the change in the scene that is of interest while a change in the lighting of the scene should be neglected. This is a problem, as it is very difficult to distinguish between a change in the scene and a change in the lighting.
U.S. Pat. No. 5,956,424 and U.S. Pat. No. 5,937,092 describe a method in video monitoring in which an attempt is made to separate the changes in the lighting from the changes in the scene. This is carried out by attempting to model the intensity of the light that radiates from the surfaces in the scene, in order to filter out changes in the lighting from changes of the actual scene.
In the method according to U.S. Pat. No. 5,956,424 and U.S. Pat. No. 5,937,092, it is assumed that the intensity that radiates from a surface Iout is directly proportional to the incident intensity Iin, that is Iout=r*Iin, where r is the reflectance of a surface. If a change in the lighting occurs, it is assumed that this is linearly proportional, that is Iout after change in the light=k*Iout before change in the light=k*r*Iin before change in the light, where k is a change in the light factor or irradiance.
The method according to U.S. Pat. No. 5,956,424 is based on calculating quotients between grayscale values of adjacent pixels, also called picture elements. Picture elements or pixels can be said to be another name for elements in the matrix that represents the digital image. The quotient is a measure that only depends on the reflectance r of a surface and is independent on the irradiance k. A new image is created, the different element values of which only reflect the reflectance in the associated pixel, and then this image is compared with a reference image in which the reflectance of each pixel is calculated under the assumption that the changes in the lighting are proportionally linear. If the change in the lighting is the same in the whole image, the curve will look the same for all pixels. The inclination of the curve represents the reflectance. The quotient between two adjacent pixels may be calculated pixel by pixel, assuming that the lighting of the scene is the same for adjacent areas, that is Iin(x+1, y)=Iin(x,y) by equation:
                    I                  out          ,          after                    ⁡              (                              x            +            1                    ,          y                )                            I                  out          ,          after                    ⁡              (                  x          ,          y                )              =                    k        ⁢                                  ⁢                              I                          out              ,              before                                ⁡                      (                                          x                +                1                            ,              y                        )                                      k        ⁢                                  ⁢                              I                          out              ,              before                                ⁡                      (                          x              ,              y                        )                                =                                        r                          (                                                x                  +                  1                                ,                y                            )                                ⁢          k          ⁢                                          ⁢                                    I                              i                ⁢                                                                  ⁢                n                                      ⁡                          (                                                x                  +                  1                                ,                y                            )                                                            r                          (                              x                ,                y                            )                                ⁢          k          ⁢                                          ⁢                                    I                              i                ⁢                                                                  ⁢                n                                      ⁡                          (                              x                ,                y                            )                                          =                        r                      (                                          x                +                1                            ,              y                        )                                    r                      (                          x              ,              y                        )                              
The quotient is thus independent of k. Thus, a change in the lighting can be discriminated from a change in the scene, since at a change of the lighting, the ratio between adjacent pixels in the present image and the ratio of the same adjacent pixels in the reference image is constant and independent of a change in the irradiance.
Proportionally linear changes in the intensities that this model represents occur when the light is reflected against a Lambertian surface. This is a matt surface, which when it is illuminated, radiates equally in all directions and does not give rise to any reflection. With this modeling and this method, the probability is increased of a detected change being due to a change in the scene. However, many changes in the lighting are still detected as changes in the scene, which can cause costly false alarms. If the light intensity is measured, in reality a curve is obtained, which is not a proportionally linear curve.
The fact that the curve is not proportionally linear is due primarily to the fact that the sensor does not depict the incident intensities proportionally linearly in grayscale values, but as an affine function. This is partially due to the fact that certain surfaces in an area monitored by the sensor do not fulfill the requirement of being a Lambertian surface. By an affine representation is meant that Iafter=aIbefore+b.
Another problem with the method according to U.S. Pat. No. 5,956,424 is that the calculation of the quotient between the intensities of adjacent pixels means that the system is more sensitive to noise. The sensitivity to noise arises, for example, in very dark areas in the image, or at edges where one side is dark and the other is light. Assume that the quotient is calculated between two pixels where the intensities in the reference image are 5 and 20 respectively, that is the quotient is 20/5=4. If the current image of the monitored area is recorded by a sensor that contains noise in each pixel of a maximum of 2 intensity levels, this quotient can vary between 22/3=7.3 and 18/7=2.4, which can be compared with the reference image's quotient of 4 (+83% to −40%).
Another known technique for attempting to solve the problem of changes in the lighting being detected as changes in the scene is a technique called NVD “Normalized Vector Distance” described in Matsuyama, Ohya, Habe: “Background subtraction for non-stationary scene”, Proceedings of the fourth Asian conference on computer vision 2000, pp 662–667. In this article there is an attempt, precisely as above, to solve the problem of changes in the lighting in the image by modeling them as proportionally linear changes in the intensities, Iout,after=krIin where k is a change in the lighting factor. In NVD the image is divided into blocks. The size of the blocks can be chosen according to the application. For example, the blocks can be 2 pixels in size. The first block can have the value (30,30) and the second block (50,50). These vectors have the same direction and it is therefore decided that the change is due to a change in the lighting. If the direction differs over a particular threshold limit, it is decided that the change is a change in the scene. By considering angles between vectors defined on the basis of intensities, forming the elements of the vectors, in partial areas, a measure is obtained that is invariant for proportionally linear changes in the intensities. By invariant is meant in this connection that the angle between the vectors of the reference image and the current image is the same, irrespective of proportionally linear transformations of grayscale values in the current image.
Using NVD, there are the same disadvantages as mentioned above, which lead to a change in the lighting being able to be interpreted as a change in the scene. The problems with noise in dark areas also still remain, as a vector is defined with components consisting of the intensities in a square that comprises a number of pixels and then this is normalized. If the vector, for example, consists of a 4-dimensional vector with small components, for example (2,5,4,4) and the reference images contain noise with a maximum of 2 intensity levels, the direction of this vector can vary considerably, which may result in false alarms.