High speed videography using camera arrays has been proposed by Schechtman et al. “Increasing Space-Time Resolution in Video,” European Conference on Computer Vision (ECCV), May 2002 and Wilburn et al. “High-Speed Videography using a Dense Camera Array” IEEE Society Conference on Pattern Recognition, 2004.
The Wilburn et al. paper disclosed that creating a single high-speed video sequence involves aligning the cameras in the array to a reference viewpoint and notes that this is a difficult task. Accordingly, the Wilburn et al. paper proposes using a simplifying assumption that the imaged scene lies within a shallow depth of a single object plane. The Wilburn et al. paper notes that this assumption only holds for scenes that are either relatively flat or sufficiently far from the array relative to the camera spacing. Where the assumption does not hold true, the Wilburn et al. paper notes that objects off the focal plane remain sharp but appear to move from frame to frame of the aligned images due to the alignment errors.
The alignment errors are a function of the incorrect estimation of depth for objects within the scene that do not lie on the single object plane. Binocular viewing of a scene creates two slightly different images of the scene due to the different fields of view of each eye. These differences are referred to as binocular disparity (or parallax). Shifts due to parallax can be corrected for with knowledge of the depth of the object and the baseline between the cameras that image the scene. When all objects are assumed to be on the same plane, alignment errors result for objects that do not lie on the plane. The Wilburn et al. paper proposes minimizing the alignment errors by capturing image data sequentially using spatially adjacent cameras. In this way, the maximum alignment error is constrained.
The camera array described in the Wilburn et al. paper utilizes inexpensive CMOS sensors that have an electronic rolling shutters. A snap-shot shutter starts and stops light integration for every pixel in a sensor at the same time. Sample and hold circuitry is then utilized to enable sequential readout. An electronic rolling shutter exposes each row just before it is read out, which eliminates the need for sample and hold circuitry. The Wilburn et al. paper identifies that a disadvantage of using sensors with rolling shutters for high speed video capture is that the rolling shutter can distort the shape of fast moving objects. Effectively, pixels near the bottom of a frame start and stop integration of light almost a frame later than pixels from the top of the frame.