The spatial resolution of conventional video cameras is steadily increasing, e.g., from 1 to 20 megapixel, or greater. However, the temporal resolutions of most conventional video camera remains limited, e.g., 30 to 60 frames per second (fps).
High speed video cameras are technically challenging and expensive because of the requirement for a high bandwidth, light efficiency, and throughput. Usually, high speed cameras have limited memory, and a dedicated bus that directly connects the memory to the sensor. The frame rate is limited by the memory size.
For example, in the Photron FastCam SA5, which is one of the most powerful high speed cameras, that amount to a maximum of three seconds of video at 7500 fps with a 1 megapixel resolution. High speed cameras also need to have specialized sensors that have high light sensitivity and image intensifiers so that each acquired frame is above the noise level to enable subsequent processing of the frames. The FastCam SA5 can reach a frame-rate of about 100,000 fps at a spatial resolution of 320×192 pixels and a cost of about $300,000.
Single and Multi Frame Spatial Super-Resolution
Methods for generating spatial super-resolution from multiple frames are well known. Unfortunately, there are fundamental limits to those methods. The limits can be overcome with the availability of additional prior information either in the form of examples of matching high-low resolution frame pairs, or by modeling frames as being compressible in an appropriate transform basis.
Temporal Super-Resolution
Spatio-temporal super-resolution can be obtained from videos acquired by multiple cameras with staggered exposures to sense dynamic events without motion blur and temporal aliasing. A dense camera array can also generate a very high speed video by converting a collection of 30 fps cameras into an equivalent virtual camera with thousands of frames per second. While those systems demonstrate that multiple cameras can be used for temporal super-resolution, they are expensive, require multiple cameras with accurate synchronization and scales only linearly with number of cameras.
Video Interpolation
Because the frame rate of the cameras and display devices can be different, there has always been a need to resample and interpolate the acquired frames for display purposes. Several techniques for such software ‘frame-rate conversion are known.
Motion Deblurring
When a video of high speed motion is acquired by a low frame-rate camera, one can either obtain noisy and aliased sharp images using short exposure durations, or acquire blurred images using longer exposure duration. Significant progress has been made in the problem of deblurring by incorporating spatial regularization terms within the deconvolution framework.
Compressive Sensing of Videos
Compressive sensing can be used to compressively acquire videos by assuming that multiple random linear measurements are available at each time instant, either using a snapshot imager, or by stacking consecutive measurements from a single pixel camera (SPC). Given such a sequence of compressive measurements, prior models about the transform domain sparsity of the video are used to reconstruct the videos.
For videos with slowly changing dynamics, fewer measurements are needed for subsequent frames after the first frame is acquired. Video acquisition by compressively sampling each frame is possible if the motion and appearance can be iteratively estimated by using a motion compensated wavelet basis to sparsely represent the spatio-temporal pixel volume, i.e., voxels. Such methods, while being very attractive in principle, have only achieved moderate success, mainly because the temporal structure of videos is not explicitly modeled, and the hardware architectures for these methods are either cumbersome and/or expensive.