The present invention is concerned with image processing, and more particularly with—in a broad sense—image recognition. By recognition, here, is meant that the image is processed to produce some result which makes a statement about the image. There are a number of different contexts in which this can be useful.
For example, if the image is of a single object, it may be desired to identify the object as being as a specific one of a number of similar objects: the recognition of human face as being that of a particular person where picture is stored in a reference database would fall into this category. Alternatively it may be desired to identify an image as containing one or more pictures of objects and to classify it according to the nature of those objects. Thus the automation of the process of indexing or retrieval of images from a database could be facilitated by such recognition, particularly where a large database (or a large number of databases, as in the case of internet searches) is involved. Recognition may be applied not only to still pictures but also moving pictures—indeed, the increasing availability of audio-visual material has identified need to monitor material transmitted on television channels, or via video on demand systems, perhaps to verify that a movie film transmitted corresponds to that actually requested.
Currently there exist recognisers for various midrange features of images, for example the presence of vertical structures, skin regions or faces and whether the photograph is taken in or out of doors. One could envisage large networks of such recognisers working in combination to make higher level statements about an image. For example vertical structures with skin tones may be seen as good evidence for people, especially if a face can be found in the vicinity.
Given such a system we could create a list of objects and features for every image. However, even with this information we would have difficulty describing what the subject of the image was. Consider a newspaper picture editor looking for picture of birds. He makes a query to a large image database for birds. However, included in the results is a photograph of a couple sitting in a Parisian café, in the distance we can just make out a small bird in the branch of a tree. This picture clearly does not satisfy the query as the editor intended it. Most human descriptions of this picture would exclude the bird, because it seems unimportant. If we could judge the relative importance of each feature we could describe to what extent it was the subject. So in the case of the Parisian café the bird would be judged to be quite unimportant.