An image stream, such as is found, for example, in television and digital video applications, consists of a time-ordered series of individual images, or frames. The images are often two-dimensional images of a three-dimensional scene, but any number of dimensions can, in principle, be ascribed to an image. For example, a one-dimensional image might be a slice of a two-dimensional image, or it might be a section of a sound-track applicable to the frame. A three-dimensional image may be an image of a scene in which all three space dimensions are represented explicitly. More dimensions could be added, for example, by imaging the x, y and z motions or accelerations. The depth dimension can also be represented by combining frames taken from different viewpoints to provide a stereoscopic or holographic view of a scene. The present invention can be applied generally to all of these examples, but is not limited to them.
In some applications, it is necessary to determine how the scene represented in an image stream changes from one frame to the next, or between images taken at the same time from different points of view as in stereoscopic projection. This may be the case, for example, where there is a requirement to measure the integrity of the image stream for quality-control purposes, or for the efficient application of a data compression algorithm. In stereoscopic projection, the depth, related to the horizontal separation (disparity) of the left and right hand images, must be monitored and controlled within limits set by viewing comfort and health considerations. As the scene itself changes, or as the camera moves in translation, pan, tilt or zoom, so one frame in a stream changes with respect to those either side of it. The assumption is usually made that the rate of change of any such changes is slow compared to the frame rate. It is then likely that views of the same physical object appear in adjacent frames, giving the possibility that its position may be tracked from frame to frame and used as part of a monitoring, or quality assurance process applied to the image stream.
Identifying an object, or a region of interest, which can be tracked from frame to frame, is not trivial. Whereas the human eye and brain can carry out this task with relative ease (if not speed), a computational algorithm must suffer from the disadvantage that it can easily recognize only simple shapes such as edges, lines or corners, and these may not be present in a particular set of frames. There are nevertheless many algorithms known in the art which perform the task with varying levels of success. US 2011/0026763 to Diggins teaches how low-bandwidth audio-visual content signatures can be generated from audio-video data streams and used for monitoring purposes. Knee, in GB 2474281, describes how image features may be identified from local data maxima in a frame. The present invention describes a relatively simple method which may be used to find points of interest in an image which is robust, but is also fast enough that it can be used in real-time applications.