In early computer vision and computer graphics approaches, environments were rendered visually by means of meshes constructed of polygons with individual properties of color and reflectance. In order to construct more photorealistic renderings, researchers soon began to explore ways to use the pixel values of actual images, giving rise to the field of image-based rendering. Some researchers explored ways to interpolate between images, while others explored methods for capturing and rendering the light field.
Circa 1996, Cohen et al. devised a light field representation (U.S. Pat. No. 6,009,188, hereinafter the “lumigraph” representation) that utilizes a cube surrounding an object. Each face on the cube has an associated set of points on that face. The light field is sampled on rays defined by combinations of 2 points, one from a given face on the cube and one point from the opposite face. This was one of the first approaches that sought a digital representation of the light field across a surface in free space, rather than the surface of the object. Furthermore, the lumigraph eliminated one of the redundant dimensions of the light field while noting that the luminance for a given light ray is constant along that ray in the absence of occlusions.
Shortly thereafter, Levoy et al. extended this idea to their representation (U.S. Pat. No. 6,097,394, hereinafter the “light slab” representation). While retaining the reduction of the redundant dimension, they showed that the pairs of planes whose combinations of points define the light rays comprising their light field representation need not be parallel, and may be arranged in any fashion. Furthermore they note that by placing one of the two planes at the horizon, that the light field may be represented via orthographic images.
These approaches overcome some of the limitations of the mesh representation. For example, rather than trying to model the Bi-Directional Reflection Distribution Function (BRDF) that may vary for each polygon of the mesh, these lumigraph and light slab representations are able to implicitly capture the reflectance information for the surfaces via actual images whose radiance information is interpolated and sorted into appropriate sample bins. Indeed, these representations are readily able to incorporate actual images from a scene provided that the vantage point of each image (or at least the transformation between image planes) is known. Furthermore, these approaches do not require the knowledge of the geometrical structure of the environment.
One potential limitation under the 4-dimensional representations of the lumigraph and light slab is that the spatial resolution and angular resolution are inextricably tied to each other. In particular, the sample spacing of the more coarsely sampled plane for a given pair of planes in the lumigraph representation (for example) will for the most part determine both the spatial and angular resolution. The spatial resolution will be on the order of magnitude of this coarser spacing, and the angular resolution will be on the order of the ratio of this coarser spacing to the distance between the planes. In many environments though, proper visual rendering requires much more spatial resolution than angular resolution, so it will in these cases be desirable to coarsely sample in the angular directions while sampling finely in the spatial directions.
Another potential limitation is the lack of locality of correlated samples, which may significantly limit the effectiveness of compression approaches. For example, a given point on a surface may have only a mild variation with respect to the angle in which it is viewed, giving rise to a number of correlated samples. However, these correlated samples may be very “far” apart with respect to their parametrization and many modern compression techniques that depend on local correlations may not be able to properly take advantage of this correlation.
Another consequence of this aforementioned lack of locality with respect to the (u, v, s, t) parametrization is that it will often be necessary to load the entire data structure into memory (or virtual memory). This in turn may limit the scale of the representation due to hardware limitations (or performance requirements).
It should be noted that some of these limitations only apply to the approximately lambertian environments, i.e. environments where the appearance of surfaces varies more spatially rather than with respect to the angle at which they are viewed. Although many environments fit this description, there are obviously some that don't. Furthermore, in the absence of geometric information (e.g. depth field data) from the scene, there is little reason to believe that these limitations can be improved upon (if these characteristics may even rightly be called limitations in this case). From another perspective though, this should provide a compelling reason to require and utilize such geometric information whenever possible.
This requirement of geometric information may seem at first glance overly restrictive. However, without such information the effective capture of the light field would require a number of samples on the same order of magnitude as the number of samples used in the 4-dimensional representation, and with so many samples it will often be possible to infer the geometric structure of the environment anyway.
On the other hand, when such geometric information is initially available it will often be possible to populate the 4-dimensional representation with finer spatial resolution than would otherwise be possible by exploiting correlations with respect to the viewing angle of a given point. A gantry apparatus would permit this representation to be more or less directly captured to obviate the need of such geometry-informed interpolation on the capture side. However, the act of rendering would still require interpolation, presumably from the local (nearest) samples, ignoring correlated nonlocal samples in the representation as discussed above.