1. Field of the Invention
The invention relates to digital pan tilt zoom (PTZ) effects achieved by multiple cameras using digital image processing to interpolate and transform separate images into seamless selectable images otherwise acquired by mechanical PTZ setups and more specifically to such systems that employ planar transforms for high speed.
2. Background
The prior art offers a multitude of ways to combine images of a scene into a single wide-angle image. One system is Apple Corporation""s Quick Time VR, which was adapted to create panoramas in a virtual reality world. The Apple system utilizes a camera to shoot a panorama based on multiple images taken as the camera is rotated around a point, the frames of the photos overlapping slightly. Software xe2x80x9cstitchesxe2x80x9d the individual photos together to make a 360 degree view. The resulting panorama image is a cylindrical projection.
In addition, it is also known to transform images so that certain input pixels of an input image transform to a portion of an output image, as is shown in U.S. Pat. No. 5,067,019 to Juday, et al. The transformation may generate a zoom in and zoom out effect. U.S. Pat. No. 5,185,667 to Zimmermann describes a system providing perspective-corrected views of selected portions of a hemispherical view using. This device inputs an image from a fish eye lens and produces a circular image of an entire hemispherical field-view, which is mathematically corrected to remove distortion. U.S. Pat. No. 5,313,306 to Kuban et el. shows system that is capable of perspective and distortion correction of taken with a wide angle camera. It provides pan, tilt, and zoom, without mechanical movement. U.S. Pat. No. 5,359,363, also to Kuban et al., shows a system with perspective and distortion corrected views of a selected portion of a field of view. U.S. Pat. No. 5,657,073 describes a method of processing multiple streams of digital or analog video, each capturing a particular or unique field of view, and transforming these images into a single panoramic or panospheric output.
Although the prior art supplies motivation for some digital effects corresponding to element of a PTZ camera system, there remains a need for efficient systems for providing full PTZ functionality based on digital processing. Since processing time for image distortion correction and connection is a computationally intense enterprise, there is a great need for methodologies that ease this burden so as to allow high frame rates and low cost, such as for video-conferencing systems.
An array of video cameras produces images that are processed to form the functional-equivalent of a PTZ camera. An offline pre-calibration procedure is used to create a two-dimensional mosaic of the observed scene with geometric correction. Any arbitrary intermediate view is generated from the collection of images.
Briefly, an array of fixed digital cameras is mounted on an arrangement to provide piecewise coverage of a panorama or panosphere with overlapping visual fields. The overlap is used for calibration. A two dimension mosaic of the observed scene is geometrically and photometrically corrected using a equations or lookup tables that are derived offline based on a calibration procedure. The corrections are then applied to the combination of images and electronic panning, tilting, and zooming of a virtual camera (with pixel interpolation as required) are performed to acquire a selected field of view. The image corrections include lens distortion correction and linear transformation (warping) of images into a single mosaic and intensity blending at the overlapping regions. The necessary transforms for creating the mosaic are computed offline and the PTZ operations are performed in real time. These steps are described in more detail below.
The input frames are continuously captured by the camera array. Stereo effects are avoided by insuring that the objects imaged are not very close to the cameras. The separate frames are registered and warped to a common planar mosaic as a panoramic environment map. A portion of the mosaic is then selected using a PTZ control input to a processor and warped into a virtual camera view.
Lens distortion may be corrected by any suitable means. In the preferred embodiment of the invention, wide angle cameras are used which create more lens distortion than long focal length lens systems. Such cameras are desirable because a smaller number of cameras may be used for a given total field of view. It is necessary to correct the distortion introduced by each lens before attempting to register the images as will become clear.
Lens distortion of a point in an image can be decomposed into three components: shift of the image center, radial distortion (also called barrel distortion), and decentering distortion. Radial distortion is the most disturbing one for purposes of frame registration. The others can be ignored for, assuming that the image center is close to the lens center and that lens components are orthogonal to the optical axis.
Lens distortion may be compensated for by various image processing techniques. It has been learned that first order geometric radial correction will provide good results and this is discussed below. However, it should be understood that many techniques may be employed within the compass of the invention and the following discussion is not intended to be limiting in this regard.
Radial distortion in most wide-angle cameras pulls image points toward the optical center. This effect is axially symmetric and depends only on the distance from the optical center through a distortion parameter xcex3. The distortion component may be expressed as:       Δ    ⁢          xe2x80x83        ⁢    r    =            ∑              i        =        1            ∞        ⁢                  γ                              2            ⁢            i                    +          1                    ⁢              r                              2            ⁢            i                    +          1                    
Terms higher than third order can be ignored as their contributions to the distortion are negligible in practice, so the above can be simplified to:
xcex94r=xcex33r3
x=xd+xcex3(xdxe2x88x92xc)r2
y=yd+xcex3(ydxe2x88x92yc)r2
where (xc, yc) is the image center, (xd, yd) the observed (distorted) point and r2=(xdxe2x88x92xc)2+(ydxe2x88x92yc)2 and (x, y) is the undistorted point. The above equation models only the cubic term of radial lens distortion, the most significant in practice. For simplicity, it is also assumed that each video frame is distorted with the same lens distortion parameter xcex3 and that both x and y are identically affected by lens distortion. Since this operation involves interpolation of pixel intensities (and/or hues) to undistort the image, its impact on processing time is significant.
To generate any intermediate arbitrary view, images acquired by the cameras must be registered and merged into a panoramic, spherical, or panospheric map of the composite viewing field. This map is a projection of the scene onto a shape, preferably a simple shape. For a region of interest that completely surrounds the camera system, this shape could be a cube, or a sphere. Reprojecting portions of an environment map to create a novel view is dependent on the type of environment map. For a cubic one, the reprojection is linear, requiring merely the display of the visible regions of six texture mapped squares in the view plane. For a spherical map, non-linear warping must be done. For panoramas or smaller fields of view, cylindrical, hemispherical, or planar environment maps can be used. For a large field with less than 180 degrees of panning, a planar map is preferable when processing time is a significant design issue. Also, it has the advantage of permitting efficient warping by specialized software, such as the Intel(copyright) Processing Library (IPL).
The planar map is an imaginary projection of the scene on a plane located an arbitrary distance from the cameras. Each lens-distortion-corrected image frame is warped (planar projection transform) onto this plane. The transform can be computed offline for each frame so that the only operation performed in real time is the actual warping. Pixel interpolation is responsible for the major computational burden.
The transform can be derived in two ways: with predefined geometric information on the camera orientations and image-forming properties or, preferably, using predefined registration points in the images themselves. The registration points should be at least four in number and visible in each pair of frames to be registered. The process is described in copending U.S. patent application Ser. No. 09/572,991 filed May 17, 2000 for APPARATUS AND METHOD FOR INDICATING A TARGET BY IMAGE PROCESSING WITHOUT THREE-DIMENSIONAL MODELING, the entirety of which is hereby incorporated by reference as if fully set forth herein.
To generate a final image, the panoramic or panoshpheric map must be warped to the frame acquired using the PTZ control signals. For this purpose, the appropriate portion of a frame, selected by the PTZ controller, is warped to a plane normal to the view axis of the virtual camera defined by the PTZ control signals. In other words, the approach is to recover the perspective transform that maps the view""s rectangle into the corresponding quadrangle in the mosaic. This is precisely the same type of transformation used to generate the planar mosaic. Note that zoom is obtained by interpolation using standard techniques so that a low resolution image can be mapped to a high resolution signal as the virtual camera is zoomed in. Preferably, the zooming technique employs anti-aliasing to minimize artifact from interpolation.
To match up the frames, overlapping regions may be intensity and/or color blended pixel by pixel. This is made possible by the fact that the warping of the overlapping regions to a common map causes the pixels corresponding to a particular scene portion substantially to coincide. Thus, the hue and intensitys of each pixel from an overlapping region can be blended or averaged, borrowing scene information from each image. The averaging may be pixel by pixel or, alternatively, an averaging kernel larger than a pixel may be used to accommodate imperfect registration between images. Various techniques are possible, but a graded blend where the weighting of the hue and/or intensity contributing to the average are biased in proportion to the distance of the pixel to the corresponding image center. The result is a smoothing effect. Before applying the smoothing effect, however, a global adjustment (for the entire frame) of hue and alpha data may be done before blending to compensate for global differences in the cameras, lighting differences in the respective fields of view, etc.
The blending operation weights the influence of one frame""s domains (again, a domain can be a pixel or a larger unit) by how far the transition is from the centerline of the overlap (i.e., or the distance of each pixel from its image""s boundary). The closer a domain is to the image boundary, the lower its contribution to the properties of the overlapping region. If k frames overlap, the properties of the resulting domains are computed based on properties P and distances d as:   P  =                    ∑                  k          =          0                N            ⁢                        d          k                ·                  P          k                                    ∑                  k          =          0                N            ⁢              d        k            
This method has the advantage of dealing with local intensity differences, which the most conspicuous to a human observer. It is also general in that it works without any assumptions about the shape of the overlapping area.
The invention will be described in connection with certain preferred embodiments, with reference to the following illustrative figures so that it may be more fully understood. With reference to the figures, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.