Prior art documents exist in which camera output is treated as a direct measure of reflectance; this can be true where chromaticity and intensity of illumination are invariant, and where a single camera is used. An example is U.S. Pat. No. 6,763,136. In fact, as known to those skilled in the art, camera output is a combination of illumination (which itself depends on illuminant), on camera sensitivity and on reflectivity.
An aim is to determine reflectance or reflectivity of surfaces represented in a scene which is subject to variations, and not constrained as above.
FIG. 6 illustrates a block diagram of an automated video surveillance system 1 which includes serially connected stages. These stages include a video acquisition stage 10, a background-foreground object segmentation stage 11, a tag detection stage 12, a skin detection stage 13, an object localization stage 14, an object tracking stage 15, and a movement classification stage 16.
The video acquisition stage, and frame grabber 10 acquire an image of a scene. The foreground object segmentation stage 11 segment foreground objects from the surrounding information, by comparison with a background model 21, and this can be cropped out leaving a new image with a foreground object or predefined tagged object. The tag detection stage 12 and skin detection stage 13 help identify an object e.g. tag or head within the image by comparison with a tag model 22 and a skin model 23.
Once these features have been identified, the object localization stage 14 determines the 2D location of foreground objects, heads and tags within the camera field of view. Then detected object locations may be tracked using an object tracking stage 15. Where the requirement of a video surveillance system is to detect movement, a movement classification module 16 is required to analyze the trajectory of each object, and compare it with preset trajectories or behaviors.
Background-Foreground segmentation stage 11, tag detection stage 12 and skin detection stage 13 are different types of object segmentation algorithms.
The goal of object segmentation algorithms is to ultimately group pixels into regions which represent objects. The output of these algorithms may be a binary mask, or a set of connected pixels which correspond to a certain group.
The object segmentation problem is divided into three main problems: selection of image representation, statistical modelling, and threshold selection.
1. Selection of Image Representation
The main question is what is the optimal image representation for object classification?
Developers have used different color spaces, e.g. RGB, normalized RGB, HSV, YCbCr, CIELAB and RGB color ratio. However normalized RGB and HSV are the most common color spaces used. It has been shown that these color spaces are more tolerant of minor variations in the illuminant.
2. Statistical Modelling
Object segmentation systems typically build a statistical model in order to define a decision rule which discriminates between the pixels of the target object and those of other objects. Statistical models used in existing object segmentation approaches are divided into parametric and non-parametric approaches. Parametric approaches use a specific functional form with adjustable parameters chosen to fit the model to the data set; examples are: Gaussian models such as mixtures of Gaussians, and elliptic boundary model. Examples of non-parametric approaches are: normalized lookup table, Bayes classifier, and self organizing map.
3. Threshold Selection
A threshold in the decision rule determines if a pixel in the test image corresponds to a target object or not. Due to camera noise and illumination variations, pixels are liable to be classified as a target object even if they do not belong to it: a high value of the threshold allows maintaining a low number of false detections. On the other hand, objects presenting a low contrast with respect to the target risk will be eliminated if the threshold is too high.
Referring to FIG. 7, the first step of a typical object segmentation algorithm 600 is video acquisition 601 followed by image conversion 604. Data of a current frame 602 is applied to a foreground segmentation step 606 that receives background model data 603 as a second input. The foreground segmentation step 606 has an output, a foreground pixel map 607 of the current frame that is fed to a noise removal step 608. The noise removal step 608 has an output 609 that is an enhanced foreground pixel map, and this is input to a connected component analysis stage 610 whose output is a set of labelled foreground regions 611. The set of labelled foreground regions 611 is fed to a further post-processing stage 612, having an output 613 that is fed to an object and feature extraction stage 614 and to an update feedback path to a background modelling stage 620, that is fed with data 605 from the image conversion stage 604 during an initialization phase.
The algorithm starts by acquiring a number (N), of training images and converting the acquired training images into a convenient image representation; e.g. a specific color space such as normalized RGB. As noted above normalized RGB can be used to reduce the effect of illumination change. The second step is to build the background model 607 for the target object. A threshold selection module has the role of choosing the appropriate thresholds (r) to use in later comparison between a current frame and the background model.
Test images are then acquired and analyzed to determine if a pixel in the test image corresponds to the background model 607 or not. The post-processing stage 608 filters out noises before making the final decision. Finally, to update the background model, model maintenance is performed.
The goal of the foreground object segmentation algorithm is to determine which pixels belong to the foreground, even before classifying the remaining changed pixels into different foreground classes.
Object segmentation is a well known desideratum. Conventional techniques may be computationally intensive, and not be capable of implementation in real-time.
A device for and a method of image processing that can enable segmentation to be carried out in a less computationally-intensive fashion would be desirable. Such a device could use an assessment of reflectivity to assist in this aim.
In order to extract the reflectivity of different parts of an image, an illumination estimation algorithm is required. Several approaches have been used to build illumination estimation algorithms, in contexts other than that of image analysis.
Referring to FIG. 8, the output of a camera depends on three factors:
1. Illuminant (light source) (E) 801
2. Spectral response characteristics (Q) of camera (sensors) 840
3. Reflectivity (S) of the object 830
For an RGB camera, the R G and B outputs are related to the illuminant, the camera response and the reflectivity by Equation (1)R=wd∫E(λ)·S(λ)·QR(λ)dλ+{tilde over (w)}s∫E(λ)·QR(λ)dλG=wd∫E(λ)·S(λ)·QG(λ)dλ+{tilde over (w)}s∫E(λ)·QG(λ)dλB=wd∫E(λ)·S(λ)·QB(λ)dλ+{tilde over (w)}s∫E(λ)·QB(λ)dλ  Eq. 1where λ is the wavelength (the visible range is approximately from 400 nm to 700 nm), E(λ) is the illuminant spectral intensity of radiation, S(λ) is the spectral reflectivity of the object, QR(λ) is the red camera sensor spectral response characteristics, QG(λ) is the green camera sensor spectral response characteristics, QB(λ) is the camera sensor spectral response characteristics, wd the parameter for the diffuse reflection 820 {tilde over (w)}s component, the geometrical parameter for the specular reflection 810 component.
The reflectivity of an object is a measure of the amount of light reflected by an object, or radiance, relative to the amount of incident light shone on the object, or irradiance, and is indicative of the reflectance or intrinsic brightness of an object.
The reflectivity of an object can be used as a signature of the object. Hence it may be used to segment the object with respect to the remainder of an image. The reflectivity of an object is composed of diffuse (Lambertian, body) and specular (surface) components.
The described embodiments deal with diffuse-reflecting materials.
Humans have the ability to separate the illumination power spectral distribution from the surface reflectance function when judging object appearance; such ability is called color constancy.