In the field of video image capture, a continuous scene in space and time is converted into an electronic set of information best described as a grid of discrete picture elements each having a number of properties including color, brightness and location (x, y) on the grid. Hereinafter the grid is referred to as a raster image and the picture elements as pixels. A raster of pixels is referred to as a frame, and a video stream is hereinafter defined as a series of frames such that when they are displayed rapidly in succession they create the illusion of a moving image—which method forms the basis of digital video and is well-known in the art.
Different formats of video may have different spatial resolutions for example NTSC has 640×480 pixels and PAL has 768×576 and so forth. These factors limit the size or spatial features of objects that can be visually detected in an image. These limitations apply as well to the art of still image photography.
In the art of video photography the issue of resolution is further affected by the rate at which bitmap images may be captured by the camera—the frame rate, which is defined hereinafter as the rate at which discrete bitmaps are generated by the camera. The frame-rate is limiting the temporal resolution of the video image. Different formats of video may have different temporal resolutions for example NTSC has 30 frames per second and PAL has 25 and so forth.
Limitations in temporal and spatial resolution of images create perception errors in the illusion of sight created by the display of bitmap images as a video stream. Rapid dynamic events which occur faster than the frame-rate of video cameras are not visible or else captured incorrectly in the recorded video sequences. This problem is often evident in sports videos where it is impossible to see the full motion or the behavior of a fast-moving ball, for example.
There are two typical visual effects in video sequences caused by very fast motion. The most common of which is motion blur, which is caused by the exposure-time of the camera. The camera integrates light coming from the scene for the entire length of the exposure time to generate a frame (bitmap image.) As a result of motion during this exposure time, fast-moving objects produce a noted blur along their trajectory, often resulting in distorted or unrecognizable object shapes. The faster the movement of the object, the stronger this effect is found to be.
Previous methods for reducing motion-blur in the art require prior segmentation of moving objects and the estimation of their motions. Such motion analysis may be impossible in the presence of severe shape distortions or is meaningless for reducing motion-blur in the presence of motion aliasing. There is thus an unmet need for a system and method to increase the temporal resolution of video streams using information from multiple video sequences without the need to separate static and dynamic scene components or estimate their motions.
The second visual effect caused by the frame-rate of the camera is a temporal phenomenon referred to as motion aliasing. Motion-based (temporal) aliasing occurs when the trajectory generated by a fast moving object is characterized by a frequency greater than the frame-rate of the camera. When this happens, the high temporal frequencies are “folded” into the low temporal frequencies resulting in a distorted or even false trajectory of the moving object. This effect is best illustrated in a phenomenon known as the “wagon wheel effect” which is well-known in the art, where a wheel rotates at a high frequency, but beyond a certain speed it appears to be rotating in the wrong direction or even not to be rotating.
Playing a video suffering from motion aliasing in slow motion does not remedy the phenomenon, even when this is done using all of the sophisticated temporal interpolations to increase frame-rate which exist in the art. This is because the information contained in a single video sequence is insufficient to recover the missing information of very fast dynamic events due to the slow and mistimed sampling and the blur.
Traditional spatial super-resolution in the art is image-based and only spatial. Methods exist for increasing the spatial resolution of images by the combination of information from a plurality of low-resolution images obtained at sub-pixel displacements. These however assume static scenes and do not address the limited temporal resolution observed in dynamic scenes. While spatial and temporal resolution are different in nature, they remain inter-related in the field of video, and this creates the option of tradeoffs being made between space and time. There is as yet no super-resolution system available in the art which enables generation of different output-sequence resolutions for the same set of input sequences where a large increase in temporal resolution may be made at the expense of spatial clarity, or vice versa.
Known image-based methods in the art become even less useful with the advent of inputs of differing space-time resolutions. In traditional image-based super-resolution there is no incentive to combine input images of different resolution since a high-resolution image subsumes the information contained in a low resolutions image. This aggravates the need in the industry for a system and method which is able to utilize the complementary information provided by different cameras, being able to combine the information obtained by high-quality still cameras (with very high spatial resolution), with information obtained by video cameras (which have low spatial resolution, but high temporal resolution) to create an improved video sequence of high spatial and temporal resolution.