The use of separate video cameras for high-speed video capture and 3-dimensional scene or object reconstruction is known in the art. General information on multi-camera systems and associated image processing can be found in U.S. Pat. Nos. 5,475,422; 5,517,236; 5,675,377; 5,710,875; 6,084,979; 6,198,852.
Now, U.S. Pat. No. 5,157,499 to Oguma et al. discloses a high-speed video camera using solid-state sensors. This system does not scale to a large number of cameras and thus limits the highest achievable system frame rate. In addition, due to small number of cameras it suffers from rolling shutter distortion when capturing fast moving objects in the scene. This distortion cannot be corrected without having visible seams in the image.
E. Shechtman, et al., “Increasing space-time resolution in video” European Conference on Computer Vision (ECCV), May 2002 teach the use of multiple cameras and align to a plane, but they do not have control over triggering. As a result, they cannot guarantee a constant frame rate for the final sequence. More importantly, their formulation relies on overlapping exposure times, resulting in blurry images. They attempt to remove the blur (and increase the temporal resolution) by a regularized deconvolution. This operation is known to be ill-conditioned and is only marginally successful while being computationally intensive.
Some known methods for controlling trigger timing are discussed in U.S. Pat. No. 6,331,871, which teaches simultaneous triggering, U.S. Pat. Nos. 6,154,251 and 5,659,233, which also teach simultaneous capture. Other approaches to triggering of synchronized camera arrays with arbitrary offsets all use cameras with dedicated synchronization inputs. Some commercial technologies exist for creating phase-shifted copies of triggering or timing signals for different cameras. These do not, however, scale well to large numbers of cameras. Moreover, they require different synchronization signals for each camera.
Still other prior art methods use still cameras with a configurable delay between triggers. These methods, as practiced, e.g., by Manex Entertainment, apparently use a trigger signal that propagates from one camera to the next after some delay. Because of the delay between cameras the trigger must run serially through the entire array of cameras and as it does so, each delay adds some timing uncertainty in addition to the propagation time for the signal. More importantly, this system works only for still cameras, since video cameras usually run at constant frame rates. The Manex system cannot guarantee that, especially with large numbers of cameras and arbitrary delays. The time from one frame acquisition to the next is at least the time for the delay signal to traverse the entire camera array. More generally, these methods and systems are not designed to allow the user to derive arbitrary space-time trajectories after filming just once. They are only designed to capture one space-time arc, or simulate one high speed, moving camera.
Much work has been done on view interpolation for scenes at a single instant in time. For some general prior art teachings on view interpolation the reader is referred to U.S. Pat. Nos. 6,097,394 and 6,831,643.
Multiple cameras are used to capture frames from a finite number of viewing positions. For a static (non-moving) scene, one could also take several images from different viewing positions using the same camera. View interpolation methods create new visual outputs or pictures of the scene that appear to have been taken from virtual viewpoints other than the actual viewpoints of the cameras. These methods are often extended trivially to video by ensuring that all of the cameras are synchronized to trigger frame capture simultaneously such that all frames or images at a given instant in time represent the same “frozen” scene.
Spatiotemporal view interpolation methods create new images of a scene from virtual viewing positions or viewpoints and at times other than the actual times and positions corresponding to the captured frames or images. For example, using a synchronized 30 frames per second (fps) camera array (triggered simultaneously), view interpolation methods could be used to create images from viewpoints other than the viewpoint of any of the cameras, but at the same time instants as the captured images. Spatiotemporal view interpolation methods can create images at new times, too. Thus, in principle, videos can be created that appear to come from video cameras with higher frame rates than those actually used to capture the frames.
In U.S. Pat. No. 6,738,533 Shum, et al. describe a minimum sampling rate and a minimum sampling curve for image-based rendering. However, Shum provides no teaching on temporal sampling for image-based rendering.
Additional work on spatio-temporal view interpolation is described by S. Vedula, et al., “Spatio-temporal view interpolation”, Eurographics Workshop on Rendering, 2002, pp. 65-75 and R. Carceroni and K. Kutulakos, “Multi-view scene capture by surfel sampling: From video streams to nonrigid 3d motion, shape & reflectance”, International Conference on Computer Vision 2001. Both of these use cameras synchronized to trigger simultaneously, and both deduce structural and reflectance models for the scene. For static scenes, this is still a challenging vision problem that requires sophisticated computational methods that incorporate data from all or many images. For spatiotemporal view interpolation, they must also infer the motion of their scene models across time.
In view of the shortcomings of the prior art, it would be an advance in the art to provide an apparatus and method that provide for deliberately and precisely controlled camera arrays for capturing scenes in a format that is well-suited for processing. In particular, it would be desirable to provide for a frame capture staging for camera arrays so as to permit spatio-temporal interpolation and processing of new visual outputs with minimal computational load.