Processing of dynamic image scenes is often carried out by a human operator visually monitoring a transmitted video feed. This can be tedious and hence lead to potentially high levels of human error, as well as being time-consuming and expensive.
Hence automatic image processing is desirable, to either fully detect events or items of interest, or to provide intermediate data (meta-data) alerting a human operator and/or further automatic processing to a possible detection of an event or item of interest.
However, although conventional automatic image processing techniques are available for use with sequences of images obtained from static viewpoints, such techniques cannot readily be applied to image data comparing a temporal sequence of images of a scene obtained from differing viewpoints (for example image data from a mobile platform, for example an unmanned aerial vehicle (UAV), or for example image data from a statically positioned camera that sweeps over a changing field of view), due to the image transformations that arise from the images in the sequence being from differing viewpoints.