A video image (which will be understood to encompass frozen images such as facsimile images, in addition to moving images) will in general include at least one object which is of interest and a "background" of lesser interest (and hence of lesser importance).
To analyse the image, e.g. detect the presence/absence or position of a particular object of interest, is often desirable in a variety of applications.
In an image transmission system an improved picture quality might be achieved if data relating to important parts of the scene, i.e. objects of interest, is coded using relatively more bits than data relating to unimportant (i.e. background) parts. For example, in a videophone system a typical image comprises a head and shoulders against a background, and the face area of the head is visually the most important; it is thus desirable to be able to identify the head area from the shoulders and background so as to be able to process the head at a higher refreshment rate than the rest, so that the impression of smooth head motion is conveyed. The ability to locate a head within a head and shoulders scene can thus be used to modify the spatial allocation of video data, enabling a degree of visual importance to be attributed to blocks within the data.
Also, if the position of an object is accurately tracked with time it will be possible to predict its motion, thus allowing "motion compensated" DPCM.
One way of identifying different regions of an image is to utilise the method proposed by Nagao (M. Nagao--"Picture recognition and data structure", Graphic Languages--ed Nake and Rossenfield, 1972). This method has been used in a videophone type system, on an image of a head and shoulders against a background. Some success was achieved in determining the sides of the head when the subject was clean shaven, but very little success was achieved in other cases; so this method is not considered reliable enough for the basis of an area identifying method.
Conventional coders, for instance hybrid discrete cosine transform coders, use no `scene content` information to code the data within the scene, so each part of the scene is operated on as if it has the same visual importance as every other part.
Other image analysis applications are manifold (for example, in automated manufacturing systems).
It is also known to code video images for transmission using Vector Quantisation (VQ). In VQ coding, the image is represented initially by an array of digital data corresponding to the image frame. Blocks of array points ("sub-arrays") are compared with vectors from a codebook, and the best-marching vector selected using a "least squares" difference criterion. A code designating this vector is then transmitted to represent the sub-array. At the receiving end the indicated vector is selected from an identical codebook and displayed.
The underlying principle of the invention, however, is to use VQ as an identification (e.g. object location) method. The extent of the various aspects of the invention are defined in the claims appended hereto.
The different areas of a video image, when vector quantised (VQ), can be operated on differently provided each entry in the VQ codebook has an associated flag indicating which area that entry represents. So in the example of the videophone two different flag entries are requires, one for the head and the other for the remainder of the scene.