In the field of computer vision one distinctive and complex task is image analysis and object recognition. A number of methods have been developed to address this task, which include: angular radial transform (ART); scale invariant feature transformation (SIFT); principal component analysis-scale invariant feature transformation (PCA-SIFT); and gradient location and orientation histogram (GLOH).
The method of ART disclosed in patent publication no. US 2002/0010704 describes a method of generating a descriptor for a complete image. This method has the disadvantage of not being able to account for occlusion or objects within an image. To overcome this disadvantage, a number of methods such as SIFT have been developed which include processing image data to identify a number of highly distinctive features or keypoints in an image.
The method of scale invariant feature transformation (SIFT) disclosed in U.S. patent Application Ser. No. 6,711,293 is a well known technique that comprises two distinctive parts: (1) keypoint detection, which identifies visually distinctive locations in a digital image; and (2) generation of a keypoint descriptor, which characterises a region or patch around the keypoint.
Generation of the keypoint descriptor in the SIFT method includes the step of orientation assignment using histograms to determine the dominant orientation(s). When there are two or more equally dominant orientations in an image region around the keypoint, which is the case in almost any image region containing complex imagery, additional keypoints and descriptors are required to be generated for each of the additional dominant orientations detected.
In addition, to generate the keypoint descriptor, it is necessary to generate a vector descriptor for each of the keypoints generated; this step also uses histograms. Therefore a problem with the SIFT method is the large amount of processing that is required to generate a keypoint descriptor that has an invariance to rotation.
Alternative methods that have been developed from SIFT are PCA-SIFT and GLOH. PCA-SIFT uses similar steps as in SIFT, however instead of using weighted histograms to form the descriptor, PCA-SIFT uses principal component analysis (PCA). An example of PCA-SIFT may be found in the publication entitled “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors” by Yan Ke and Rahul Sukthankar. GLOH is yet another alternative method to SIFT, and is again based on the SIFT method, but utilises more spatial regions in the histograms. The resultant descriptor, which is dimensionally higher than that created using SIFT, is then reduced using PCA.
A problem with all of the above methods is that whilst they may be apt at object detection in still images they require a large number of keypoints and they have not been created to tackle the problems that would be encountered in object detection in a series of images such as video applications.
The problem with object detection in video, and for example video on the internet, is that video has a number of varying attributes that must be considered when performing computational analysis of several consecutive images to detect an object; for example: 1) bit rate and encoding method, and therefore quality and video compression artefacts; 2) resolution; and 3) colour balance.
Whilst much of these effects are not very noticeable to a human observer, this is because current video encoding systems rely on persistence of vision of the human visual system; they do, however, raise a significant problem for computational algorithmic analysis of the video frames.
An example of the problems that may occur in video, in respect of estimating keypoint motion from an image frame to a consecutive image frame, is shown in FIGS. 1a, 1b and 1c. Referring to FIG. 1a there are shown three consecutive frames 10a, 10b and 10c in time having the same keypoint identified as 12a, 12b and 12c, respectively, in the different frames. A line A-A passing through the centre of the keypoint 12a in the first frame 10a passes through the corresponding pixel in frames 10b and 10c; the keypoint in frames 10b and 10c deviates from the line A-A. FIG. 1a therefore illustrates how the pixel location of a keypoint may move from one frame to next frame in video.
Referring to FIG. 1b there are shown three consecutive frames 14a, 14b and 14c and two possible motion paths 18, 19 that an image patch 16a may take. In a first possible motion path 18 it can be seen that the image patches 16b and 16c in respective second and third frames 14b and 14c are located higher in each of the respective frames than image patches 16b′ and 16c′ shown in the second possible motion path 19.
Referring to FIG. 1c there are illustrated three consecutive frames 20a, 20b and 20c. In the middle frame 20b there is an image patch 22 having a keypoint 24. The keypoint 24′ and 24″ on respective first and third frames 20a and 20c on either side of the middle frame 20b show a possible motion path of the keypoint 24 along a bi-conical surface.
Referring to FIG. 2 there is shown an example of three consecutive images, running from left to right of FIG. 2, which together form a video sequence illustrating fine variability. These images show the degree of variation in each of the frames. The feature labelled 26 has been marked in each of the consecutive frames. It can be seen that the feature 26 near the right hand side of the images appears to move to the lower right hand corner for the consecutive frames.
The problems in accurately identifying an object in consecutive video images that can arise from movement of a keypoint location in one frame compared with another frame is not addressed by the prior art.
It is an object of the present invention to provide improvements in relation to image and video processing.