1. Field of Invention
The present invention is related to a method of generation a light transport matrix T in a projector-camera system where the resolution of the camera is not greater than that of the projector.
2. Description of the Related Art
When projectors and cameras are combined, hybrid devices and systems that are capable of both projecting and capturing light are born. This emerging class of imaging devices and systems are known in the research community as projector-camera systems. Typically, images captured by one or more cameras, are used to estimate attributes about display environments, such as the geometric shapes of projection surfaces. The projectors in these projector-camera systems then adapt their projected images so as to compensate for shape irregularities in the projection surfaces to improve the resultant imagery. In other words, by using a camera, a projector can “see” distortions in a projected image, and then adjust its projected image so as to reduce the observed distortions.
In order to achieve this, the camera and projector need to be calibrated to each other's imaging parameters so as to assure that any observed image distortion is due to irregularities in the projection environment (i.e. surface irregularities), and not due to distortions inherent to the projector or camera, or due to their relative orientation to each other.
Thus, a key problem that builders of projector-camera systems and devices need to solve is the determination of the internal imaging parameters of each device (i.e. intrinsic parameters) and the determination of the geometric relationship between all projectors and cameras in the system (i.e. extrinsic parameters). This problem is commonly referred to as that of calibrating the system.
Even after a system has been substantially calibrated, however, the issue of adjusting a projection to compensate for distortions in a projected image is not straightforward. Identifying and compensating for projection distortion can be a very computationally intensive operation, which has traditionally greatly limited its application to non-specialized fields.
In an effort to better understand the calibration of projector-camera systems, Applicants studied multi-camera imaging systems found in the field of computer vision. Although such multi-camera imaging systems consist of only image photographing devices, and do not include any image projecting devices, a large body of work concerning the calibration of such multi-camera imaging systems exists in the field of computer vision, and it was thought that one might glean some insight from their approach toward calibrating multiple devices, albeit multiple image photographing devices.
A commonly used method in computer vision techniques for calibrating a camera in an imaging system is described in article, “A flexible new technique for camera calibration”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330-1334, 2000, by Zhengyou Zhang, which is herein incorporated in its entirety by reference. In this method, multiple images of a flat object marked with a number of known feature points (typically forming a grid) are captured by the camera, with the flat object posed at a variety of known angles relative to the camera. The image location of each feature point is extracted, and since the relative location of each feature point is known, the collection of feature point locations can then be used to calibrate the camera. When two or more cameras are present in an imaging system, the intrinsic parameters as well as the geometric relationship between the present cameras can be estimated by having all cameras capture an image of the flat object at each pose angle.
Since projectors and cameras are very similar in terms of imaging geometry, it might seem reasonable to postulate that techniques suitable for calibrating cameras in multi-camera imaging systems might be suitable for calibrating projectors in projector-camera systems. However, since all camera calibration techniques require that the camera requiring calibration (i.e. the imaging device being calibrated) capture a number of images, it would appear that camera calibration techniques cannot readily be applied to projectors since projectors cannot capture images.
Therefore, in traditional projector-camera systems, at least two cameras have been needed, in addition to a projector. The two cameras are calibrated first, typically using multi-camera imaging system calibration techniques, to establish a stereo camera pair. More specifically, these systems use a “bootstrapping” procedure to calibrate the two cameras and form the stereo camera pair. As it is known in the art, a stereo camera pair can be used to estimate depth (i.e. achieve a pseudo perspective view) to establish a quasi-depth perception of feature points visible to the stereo camera pair. The calibrated stereo camera pair is then used to calibrate the projector. Basically, the establishment of this quasi-depth perception is used to identify surface depth irregularities of a projection surface and thereby of an image projected onto the projection surface. The projector can then be calibrated to compensate for the surface depth irregularities in the projected image. In essence, to calibrate the projector using this quasi-depth perception, the projector is first made to project feature points onto a display environment (i.e. the projection surface), which may have an irregular surface. The pre-calibrated, stereo camera pair is used to resolve the perspective depth location of the projected points. The projector can then be calibrated to compensate for surface/depth irregularities in the projection surface, as determined by the depth location of the projected points. While this bootstrapping technique is a tested-and-proven calibration method for projector-camera systems, it is not applicable to the calibration of self-contained projector-camera devices, since it requires the use of pre-calibrated; strategically located, external stereo camera pairs, and thus requires much operator intervention.
Of related interest is a technique called dual photography proposed by Sen et al. in article, “Dual Photography”, Proceedings ACM SIGGRRAPH, 2005, which is herein incorporated by reference in its entirety. Dual photography makes use of Helmholtz reciprocity to use images captured with real cameras to synthesize pseudo images (i.e. dual images) that simulate images “as seen” (or effectively “captured”) by projectors. That is, the pseudo image simulates a captured image as “viewed” by a projector, and thus represents what a projector-captured image would be if a projector could capture images. This approach might permit a projector to be treated as a pseudo camera, and thus might eliminate some of the difficulties associated with the calibration of projectors.
Helmholtz reciprocity is based on the idea that the flow of light can be effectively reversed without altering its light transport properties. Helmholtz reciprocity has been used in many computer graphics applications to reduce computational complexity. In computer graphics literature, this reciprocity is typically summarized by an equation describing the symmetry of the radiance transfer between incoming (ωi) and outgoing (ωo) directions as fr(ωi>ωo)=fr(ω>ωi), where fr represents the bidirectional reflectance distribution function (BRDF) of a surface.
Thus, dual photography ideally takes advantage of this dual nature (i.e. duality relationship) of a projected image and a captured image to simulate one from the other. As is described in more detail below, dual photography (and more precisely Helmholtz reciprocity) requires the capturing of the light transport property between a camera and a projector. More specifically, dual photography requires determination of the light transport property (i.e. light transport coefficient) relating an emitted light ray to a captured light ray.
When dealing with a digital camera and a digital projector, however, dual photography requires capturing a separate light transport coefficient relating each projector pixel (i.e. every emitted light ray) to each, and every, camera pixel (i.e. every light sensor that captures part of the emitted light ray), at the resolution of both devices. Since a digital projector and a digital camera can both have millions of pixels each, the acquisition, storage, and manipulation of multitudes of light transport coefficients can place real practical limitations on its use. Thus, although in theory dual photography would appear to offer great benefits, in practice, dual photography is severely limited by its physical and impractical requirements of needing extremely large amounts of computer memory (both archival, disk-type memory and active, solid-state memory), needing extensive computational processing power, and requiring much time and user intervention to setup equipment and emit and capture multitudes of light rays for every projection environment in which the projector-camera system is to be used.
A clearer understanding of dual photography may be obtained with reference to FIGS. 1A and 1B. In FIG. 1A, a “primal configuration” (i.e. a configuration of real, physical devices prior to any duality transformations) includes a real digital projector 11, a real projected image 13, and a real digital camera 15. Light is emitted from real projector 11 and captured by real camera 15. A coefficient relating each projected light ray (from each projector pixel e within real projector 11) to a correspondingly captured light ray (captured at each camera sensor pixel g within real camera 15) is called a light transport coefficient. Using the light transport coefficient, it is possible to determine the characteristics of the projected light ray from the captured light ray.
In the present example, real projector 11 is preferably a digital projector having a projector pixel array 17 symbolically shown as a dotted box and comprised of s rows and r columns of individual projector pixels e. Each projector pixel e may be the source of a separately emitted light ray. The size of projector pixel array 17 depends on the resolution of real projector 11. For example, a VGA resolution may consist of an array of 640 by 480 pixels (i.e. 307,200 projector pixels e), an SVGA resolution may have an array of 800 by 600 pixels (i.e. 480,000 projector pixels e), an XVG resolution may have an array of 1024 by 768 pixels (i.e. 786,732 projector pixels e), an SXVG resolution may have an array of 1280 by 1024 pixels (i.e. 1,310,720 projector pixels e), and so on, with greater resolution projectors requiring a greater number of individual projector pixels e.
Similarly, real camera 15 is a digital camera having a camera sensor pixel array 19 symbolically shown as a dotted box and comprised of u rows and u columns of individual camera pixels g. Each camera pixel g may receive, i.e. capture, part of an emitted light ray. The size of camera sensor pixel array 19 again depends on the resolution of real camera 15. However, it is common for real camera 15 to have a resolution of 4 MegaPixels (i.e. 4,194,304 camera pixels g), or greater.
Since each camera pixel g within camera sensor pixel array 19 may capture part of an individually emitted light ray from a distinct projector pixel e, and since each discrete projector pixel e may emit a separate light ray, a multitude of light transport coefficients are needed to relate each discrete projector pixel e to each, and every, camera pixel g. In other words, a light ray emitted from a single projector pixel e may cover the entirety of camera sensor pixel array 19, and each camera pixel g will therefore capture a different amount of the emitted light ray. Subsequently, each discrete camera pixel g will have a different light transport coefficient indicating how much of the individually emitted light ray it received. If camera sensor pixel array 19 has 4,194,304 individual camera pixels g (i.e. has a 4 MegaPixel resolution), then each individual projector pixel e will require a separate set of 4,194,304 individual light transport coefficients to relate it to camera sensor pixel array 19. Therefore, millions of separately determined sets of light transport coefficients (one set per projector pixel e) will be needed to relate the entirety of projector pixel array 17 to camera sensor pixel array 19 and establish a duality relationship between real projector 11 and real camera 15.
Since in the present example, each discrete projector pixel e requires a separate set of 4,194,304 individually determined light transport coefficients to relate it to real camera 15, and since real projector 11 may have millions of discrete projector pixels e, it is beneficial to view each set of light transport coefficients as a separate array of light transport coefficients and to collect these separate arrays into a single light transport matrix (T). Each array of light transport coefficients constitutes a separate column within light transport matrix T. Thus, each column in T constitutes a set of light transport coefficients corresponding to a separate projector pixel e.
Since in the present example, real projector 11 is a digital projector having an array of individual light projector pixels e and real camera 15 is a digital camera having an array of individual camera pixels g, a light transport matrix T will be used to define the duality relationship between real projector 11 and real camera 15. In the following discussion, matrix element Tge identifies an individual light transport coefficient (within light transport matrix T) relating an individual, real projector pixel e to an individual, real camera pixel g.
A real image, as captured by real camera 15, is comprised of all the light rays individually captured by each camera pixel g within camera sensor pixel array 19. It is therefore helpful to organize a real captured image, as determined by camera sensor pixel array 19, into a real-image capture matrix, C. Similarly, it is beneficial to organize a real projected image, as constructed by activation of the individual projector pixels e within projector pixel array 17, into a real-image projection matrix, P. Using this notation, a real captured image (as defined by real-image capture matrix C) may be related to a real projected image (as defined by real-image projection matrix P) by the light transport matrix T according to the relationship, C=TP.
The duality transformation, i.e. dual configuration, of the system of FIG. 1A is shown in FIG. 1B. In this dual configuration, real projector 11 of FIG. 1A is transformed into a virtual camera 11 and real camera 15 of FIG. 1A is transformed into a virtual projector 15″. It is to be understood that virtual camera 11″ and virtual projector 15″ represent the dual counterparts of real projector 11 and real camera 15, respectively, and are not real devices themselves. That is, virtual camera 11″ is a mathematical representation of how a hypothetical camera (i.e. virtual camera 11′) would behave to capture a hypothetically projected dual image 13″, which is similar to real image 13 projected by real projector 11 of FIG. 1A. Similarly, virtual projector 15″ is a mathematical representation of how a hypothetical projector (i.e. virtual projector 15″) would behave to project hypothetical dual image 13″ that substantially matches real image 13, as captured by real camera 15 (of FIG. 1A). Thus, the positions of the real projector 11 and real camera 15 of FIG. 1A are interchanged in FIG. 1B as virtual camera 11″ and virtual projector 15″.
It should be noted that the pixel resolution of the real devices carries forward to their counterpart virtual devices (i.e. dual devices). Therefore, virtual camera 11″ has a virtual camera sensor pixel array 17″ consisting of s rows and r columns to match the resolution of projector pixel array 17 of real projector 11. Similarly, virtual projector 15″ has a virtual projector pixel array 19″ consisting of u rows and u columns to match the resolution of camera sensor pixel array 19 of real camera 15.
If one assumes that dual light transport matrix T″ is the light transport matrix in this dual configuration such that a dual-image capture matrix C″ (which defines dual image 13″ as captured by virtual camera 11″) relates to a dual-image projection matrix P″ (which defines dual image 13″ as projected by virtual projector 15″) as C″=T″P″, then T″eg would be an individual dual light transport coefficient relating an individual virtual projector pixel g″ to an individual virtual camera pixel e″.
Helmholtz reciprocity specifies that the pixel-to-pixel transport coefficient is equal in both directions (i.e. from real projector 11 to real camera 15, and from virtual projector 15″ to virtual camera 11″). That is, T″eg=Tge, which means T″=TT, (i.e. dual light transport matrix T″ is equivalent to the result of the mathematical, matrix transpose operation on real light transport matrix T). Therefore, given light transport matrix T, one can use TT to synthesize the dual, or virtual, images that would be acquired in the dual configuration.
Thus, the light transport matrix T permits one to create images that appear to be captured by a projector, with a camera acting as a second projector. However, as is explained above, the high complexity involved in generating and manipulating light transport matrix T has heretofore greatly limited its application, particularly in the field of calibrating projector-camera systems.
Other problems associated with projector-camera systems are how to compensate for light diffusing objects that may obstruct a projector's line of sight. Of related interest are issues of whether projector-camera systems can be used to achieve more complex images than typical. For example, can such systems combine multiple images from multiple projectors to create a single composite image? Alternatively, can one generate “3-D” images, or other visual effects that previously required more complex equipment and more complex projection setups? Also, can one make better use of the camera in a projector-camera system so that the camera can be an active part of an image creation process. Furthermore, what are the implications of using a low resolution, inexpensive camera in such projector-camera systems?
Previous works [Raskar et al. 1998; Underkoffler and Ishii 1998] put forth the concept of intelligent illumination and showed how projectors could be used to enhance workplace interaction and serve as novel tools for problem solving. The projector-camera community has since solved many of the technical challenges in intelligent projectors. In particular, significant advances have been made in automatic mosaicing of multiple projectors [Chen et al. 2002; Yang et al. 2001; Raij et al. 2003; Sukthankar et al. 2001; Raskar et al. 2003].
[Raskar et al. 2001] demonstrated projection onto complex objects. Using previously created 3D models of the objects, multiple projectors could add virtual texture and animation to real physical objects with non-trivial complicated shapes.
[Fujii et al. 2005] proposed a method that modified the appearance of objects in real time using a co-axial projector-camera system. [Grossberg et al. 2004] incorporated a piecewise polynomial 3D model to allow a non-co-axial projector-camera system to perform view projection.
Projector camera systems have also been used to extract depth maps [Zhang and Nayar 2006], and space-time-multiplexed illumination has been proposed as a means for recovering depth edges [Raskar et al. 2004].
As will be explained more fully below, the present invention addresses the problem of how to determine what a projector needs to project in order to create a desired image by using the inverse of the light transport matrix, and its application will further simplify the calibration of projector-camera systems.