Increasingly powerful computation, large storage, and expanding transmission bandwidths have enabled a wide variety of applications in the video and imaging market, which provides modern users all kinds of enriched visual experiences. The advent of inexpensive digital cameras has enabled many sensing systems that incorporate large numbers of cameras, such as the Stanford multi-camera array as part of the Stanford Immersive Television Project. Using clusters of inexpensive cameras to capture dynamic real-world scenes, users can be provided with a large variety of immersive experiences such as digital refocusing and synthetic large aperture that otherwise either are impossible or can only be obtained through particular high-end expensive professional devices.
Images taken from multiples cameras in a camera array can be sequenced together to provide these immersive experiences without requiring specialized equipment. Smooth view switching and zooming is highly demanded in practice for rich viewing experiences in television and movie production. Switching frames across multiple cameras over time creates the effect of a single camera is moving around. The dolly effect, the freeze time (or bullet time) effect, and the long panorama effect, for instance, are the most popular ones, and can be widely seen in movies, in broadcasts, and in documentary videos.
Another way of creating smooth view switching effects is through light field rendering (or image-based rendering), which can synthesize different views of a scene. However, light field rendering can only obtain reasonable results when a dense sampling is available, i.e., when a large number of cameras are placed densely together to have large overlapping fields-of-view, since oversampling is required to counter the undesirable aliasing effects in the outputs. Also, light field rendering requires much higher computation to create synthetic views than direct view switching, which makes it quite difficult to be used for real-time or online video production. The costs, both in terms of the large number of cameras and in terms of large amounts of required computational power, are generally not realistic either for live broadcasting or for small consumer groups.