There is great interest in the improvement of computer graphic systems that use 3D and 2D data to create images. Current uses for visual images in graphic applications demand systems that store extensive image data more compactly, build images with greater control in detail resolution and process images with increased speed and efficiency. Although 3D and 2D graphic systems have different underlying methods for image generation, both have common difficulty in processing the massive amount of data necessary to generate still images and animated sequences with computational efficiency and convincing realism. Background on both 3D and 2D systems is presented as follows.
3D Data Systems
A 3D object modeling system typically generates a model of an object, terrain or other surface (hereinafter an "object") from input data and uses that model to create a display or reproduction of the object (such as a monitor display or printout). When a 3D object model replicates the entire surface of the object, a 3D graphics system allows a user to output or display images showing any side or face of the object from any vantage point. A user of a 3D graphics system can load a 3D object model into a viewer program and change his or her view of the object by commands to rotate the viewing window around the object or "zoom" close to or away from the object. A 3D graphics system builds more complex scenes by grouping different object models and viewing them together. For example, 3D object models for a chair, a boy, a lamp, and a book can be loaded into a viewer to show a boy sitting in a chair reading a book. As the 3D models contain information to show all sides of the objects in the scene, the user can rotate the viewing window and view the scene from all angles.
Because 3D object modeling systems can access complete three-dimensional information about each object depicted, they facilitate the construction of complex, interactive animated displays, such as those created by simulators and other user choice-based programs. Although 2D image generation systems currently predominate in the display and manipulation of graphic images, the use of 3D modeling systems is perceived as a more efficient way to present graphic information for interactive graphics, animated special effects and other applications and the use of such systems is growing.
3D systems construct object models from 3D spatial data and then use color or other data (called "texture data") to render displays or images of those objects. Spatial data includes 3D X, Y, Z coordinates that describe the physical dimensions, contours and features of the object. The current effort in computer graphics to incorporate more images of real-life objects into applications has fostered improvements in collecting 3D spatial data such as through the use of scanning systems. A scanning system uses a light source (such as a laser) to scan a real-world object and a data collection device (such as a camera) to collect images of the scanning light as it reflects from the object. The scanning system processes the captured scan information to determine a set of measured 3D X, Y, Z coordinate values that describe the object in question. Some scanning systems can easily gather enough raw data to generate several hundred thousand 3D data point coordinates for a full wraparound view of an object. A typical 3D object modeling system processes the 3D point data to create a "wire-frame" model that describes the surface of the object and represents it as a set of interconnected geometric shapes (sometimes called "geometric primitives"), such as a mesh of triangles, quadrangles or more complex polygons. The points can come to a 3D object modeling system either as a set of random points (i.e., a "cloud of points") with no information concerning shape (known as connectivity information) or the points can come with some connectivity information such as information indicating a "hole," for example, the space bounded by the handle of a tea cup.
Typical mesh modeling systems use the spatial data--the 3D X, Y, Z coordinates--either indirectly, in gridded mesh models, or directly, in irregular mesh models. Gridded mesh models superimpose a grid structure as the basic framework for the model surface. The computer connects the grid points to form even-sized geometric shapes that fit within the overall grid structure, determining the X, Y, Z locations for the grid points by interpolating them from collected spatial data points. There are various ways of creating gridded mesh representations, such as those shown in U.S. Pat. No. 4,888,713 to Falk and U.S. Pat. No. 5,257,346 to Hanson. While gridded models provide regular, predictable structures, they are not well-suited for mesh constructions based on an irregular set of data points, such as those generated through laser scanning. The need to interpolate an irregular set of data points into a regular grid structure increases computation time and decreases the overall accuracy of the model. Hence, modeling systems typically create an irregular mesh model, such as an irregular triangulated mesh, to represent a real-world object.
In addition to using spatial data, 3D mesh modeling systems also use texture data to display and reproduce an object. Texture data is color and pattern information that replicates an object's surface features. Typically, 3D object modeling systems maintain texture data separately from the "wire-frame" mesh and apply the texture data when rendering the surface features. Thus, object modeling systems typically include two distinct and separate processes: first, in a building phase, the system constructs a "wire frame" mesh to represent the object's spatial structure using only 3D X, Y, Z values and, second, during a rendering phase, the system applies the texture data to output a display or reproduction. "Texture mapping" or "texturing" is the part of the rendering phase process that overlays texture data on the geometric faces of a mesh model. The rough face of a brick, the smooth and reflective surface of a mirror and the details of a product label can all be overlaid onto a mesh wire frame model using texture mapping principles.
For models of real-world objects, texture data typically comes from 2D photographic images. The laser scanning systems described above can collect texture data by taking one or more 2D photographic images of the object in an ordinary light setting as they collect laser scan data. Thus, 3D scanning systems both scan an object with a laser to collect spatial data and photograph it to collect color and other surface characteristic information. The laser-collected 3D X, Y, Z coordinate values can be related and linked to specific points (i.e. pixel locations) in the digitized versions of the collected photo images. Commercially available video cameras output frames that can be digitized into a 2D matrix of pixels (e.g. 640.times.480 pixels in dimension), with each pixel having, for example, a three-byte (24 bit) red, green and blue (R, G, B) color assignment. Storage for each such video frame view then requires approximately 900 K (kilobytes) and the frame will typically be stored as a "bitmap" (such as in TIFF format). A 3D object modeling system will link each mesh face in the generated 3D mesh model to a specific area in the bitmap. The image can be stored as a texture map file and relevant areas of the image can be clipped as texture map elements for use in texture map overlays.
To output a fully-rendered view of the mesh model from a desired perspective, the currently available 3D graphics systems typically overlay corresponding texture map elements on the geometric mesh faces in view. This overlaying procedure presents some complications as the system must rotate and scale each texture map element to fit the image of the wire frame mesh as it appears to the viewer. The widely-followed OpenGL standard, for example, supports the scaling of texture map elements through a technique called "mipmapping". Mipmapping allows the texture map file to contain different-sized versions of each texture map element which the system uses as overlays for different-scaled views of the object.
In addition to the complications presented by the use of the texture data, the use of and demand for 3D modeling is hindered by large storage requirements. Most current systems continue to store both a file of mesh model data and a separate file of bitmap texture map data. Such a configuration imposes a high overhead on the system in terms of the memory needed to access and manipulate the object model. Texturing necessitates that the entire texture map file be loaded into a designated RAM cache, placing great strain on limited RAM resources. For example, a texture map file for a person's head might comprise texture elements from six photographic views of the head--one view for front, back and each side of the head plus a top and bottom view--as well as data necessary to partition the various texture elements and mipmaps. Also, texture has projectability problems. It may be necessary to use multiple textures of the same subject, either for topological reasons or to address projective distortions.
As the photographic images for each view require roughly 900 K of storage, a texture map comprising six views might require on the order of 5 Mb (megabytes). Even when the texture map data is stored in a compressed format, it still must be fully expanded when loaded into RAM for use. When several 3D object models are used for a complex display (such as a figure with background objects--trees and birds, for example), the amount of storage necessary for outputting all the objects in the display can be prohibitively large. The structure and size of the texture map file has also precluded or limited use of 3D applications on communication systems like the Internet, where bandwidth is limited and does not readily facilitate transfer and communication of such substantial object information files.
The use of the texture map file also creates time delays in processing images. Most systems require special graphics hardware for real-time performance. The extra hardware needed increases the cost of the system and, for the Internet, where most users access the system with more limited PC-type computers, such a hardware solution is not currently a viable option. Typically, a PC contains a graphics acceleration device such as a video graphics array (VGA) standard card which assists in "displaying" each image (i.e., rapidly outputting a set of pixel assignments from a window frame buffer to a display monitor). However, on the PC, the tasks of "transformation," (transforming the 3D X, Y, Z coordinates of the object model to "eye-space" coordinates for a particular view, lighting the object accordingly and projecting the image onto a "window space") and "rasterization," (the process of rendering "window-space primitives" such as points, lines and polygons for the particular view and designating detailed pixel color setting information such as texture map information and depth of field calculations), are typically performed by the PC's general-purpose "host" processor. For real-time speed, the correct object modeling systems typically need more advanced and more expensive computers employing special graphics hardware to perform the "transformation," "rasterization" and other processes.
In addition to problems with size requirements and processing delays, current 3D object modeling systems are also hampered by a lack of flexibility in controlling image detail or resolution. Current scanning systems can provide an abundance of data about an object, 3D object modeling systems typically use all of the data to create a single, very detailed 3D object model. However, in some applications, such as computer games and animated sequences, it is desirable that an object be represented in many different resolutions. For example, an object depicted from a distant viewpoint does not require the same level of detail as an object seen close-up. Moreover, as the available transmission bandwidth of the Internet places limitations on the amount of image detail any one image can carry, it would be desirable for a 3D object modeling system to have the capability to vary the level of resolution in the model and correspondingly, vary the texture map information. Such a system would have a modeling system which could display a mesh at many levels of resolution, from low to high, depending on the constraints of the system and the application. There are other systems for meshing which have the ability to optimize and incrementally add and remove points or edges from a mesh construction, such as shown by Hoppe (see, e.g., "Progressive Meshes" (SIGGRAPH 96) and "View-Dependent Refinement of Progressive Meshes" (SIGGRAPH 97) and others. While such systems can optimize and change resolution, inter alia, they typically require large amounts of processing time to prepare the mesh or do not provide a reliable visual representation of the object when the mesh contains few polygons.
Real limitations in the use of 3D graphics systems arise in part from the use of texture map files and the subsequent coordination of texture data with the spatial data in the mesh model. Therefore, it would be preferable to make such coordination more efficient or to incorporate the texture map data into the mesh model and thus eliminate texture map data as a separate element altogether. A new system and method for modeling 3D objects that eliminates the need for the texture map file, permits more compact storage of the 3D object model and provides a rapid, flexible system to create and vary the resolution of the object model would represent an advance in the art. The reduced storage needs of such a system and its flexibility in specifying resolution would enable the object model to be easily transmitted across a communication system like the Internet and would allow for faster image display and manipulation without advanced hardware.
2D Data Applications
Although 3D object modeling and display systems represent the future in many interactive applications, 2D image display systems continue to have great utility for graphic representations. It would be an advantage to improve the efficiency of such systems, especially in the way they process picture data such as bitmap data. As described above, a bitmap is a 2D array of pixel assignments that when output creates an image or display. The computer "reads" photographs, film and video frame images in bitmap format, and such 2D bitmap images constitute very large data structures.
2D image display systems share with 3D object modeling systems the fundamental problem of data storage. It is not uncommon for a single 2D image to comprise a bitmap matrix of 1,280.times.1,024 pixels where each pixel has a 3 byte (24 bit) R, G, B color depth. Such an image requires approximately 4 Mb of storage. A typical frame of raw video data digitizes to a computer image 640.times.480 pixels in dimension. As stated above, if each pixel has a 3 byte color assignment, that single frame requires approximately 900 K of storage memory. As film and video typically operate at 24-30 frames per second to give the impression of movement to the human eye, an animated video sequence operating at 30 frames per second requires roughly 26 Mb of pixel assignment information per second, or 1.6 Gb (gigabytes) per minute. Even with enhanced RAM memory capabilities, the storage requirements of such 2D images can impede the operating capacity of the common PC; processing a single image can be difficult and processing an animated sequence is impossible without special video hardware. The size of these image files makes them unwieldy to manipulate and difficult to transport. For example, a user wishing to download a 2D image from an Internet or other communication system site to a PC typically finds the process slow and cumbersome. Such a constraint limits the use of 2D images in many applications, including the new, interactive Internet web applications.
Currently, graphic data compression techniques provide some answer to the impediments posed by 2D bitmap data storage requirements. Such procedures replace raw bitmap data with an encoded replica of the image. Compression techniques are known as either "lossless," meaning that they lose no data in the encoding process or "lossy," meaning that they discard or lose some of the original bitmap data to achieve a high compression factor. One widely used "lossy" compression standards for still 2D images is the JPEG (Joint Photographic Experts Group) standard. JPEG compresses individual photographs or video frame images following a technique that takes advantage of the image's specific spatial structure. Within the image's color area, JPEG will disregard or homogenize certain pixel information to remove redundant information and thus reduce the overall size of the digitized image for storage and transport. However to display an image, a compressed JPEG file must be decompressed.
JPEG and other similar currently available compression systems possess real advantages for the compression and decompression of image data in certain circumstances. However, there are also drawbacks to these systems. JPEG represents only a method for data reduction; it is a compression process used mainly for storage. Its compression, which occurs on a pixel by pixel basis, goes far in reducing the overall size of the data chunk needed to store an image but at low resolutions the image quality becomes unacceptable. Moreover, the compression is not continuously dynamic such that details cannot be easily added or removed from an image. For small memory spaces, (such as those needed to send and transmit files via the Internet in real-time) the quality of the image can deteriorate sharply. Further, when a JPEG file is loaded into RAM it must be decoded and expanded before it can be used thus limiting for real time applications some of the compression benefit. (JPEG's "progressive buildup" extension, which outputs a rendering of an image in detail layers, offers some relief for systems which display JPEG files on the fly, but progressive JPEG is time consuming and, ultimately a quality resolution image requires a substantial block of RAM space, and the resolution of the image cannot be dynamically changed.) In addition, although JPEG standard users have some choice in determining the level of compression and the amount of "lossiness," JPEG's flexibility is limited by the way in which it reads and modifies the graphic image.
A system for modeling 2D images that lent an overall structure or model to the image and subsequently compressed data based on structure rather than on individual pixel values would allow greater compaction and more flexibility of use. Such a system would not only reduce the amount of data necessary to store and transmit a 2D image but would also provide other capabilities, such as the ability to vary the resolution quality of the model rapidly and dynamically. Such a system would also permit the data to remain compressed at runtime, thereby facilitating its use in real time applications.