The present invention relates generally to digital manipulation of image signals and more specifically to a system and method for creating 3D models from 2D digital input images.
The depiction of visual images as spatial domain representations displayed by a system on a suitable display device, such as the cathode ray tube (xe2x80x9cCRTxe2x80x9d) of a television or computer monitor or via film projected on a screen, is well known in the art. However, as such display devices are typically limited to presenting the representations as a two-dimensional spacial representation on the surface of the display device, these spatial domain representations do not include complete visual image information. For example, due to occlusion by a foreground object, image information with respect to a background object may be missing. Likewise, image depth, or parallax, information, discernable as an apparent displacement in the position of an object as seen against a more distant background or other object when the viewpoint is changed, is typically lost in two-dimensional representations.
However, acquisition of such visual images to be represented two-dimensionally is straight forward, using opto-electronic transducing devices such as movie film cameras, video cameras and computer scanning devices to capture a spatial domain representation from a single vantage point, where the source of the visual image is a perceptible image. Likewise, acquisition of two-dimensional images generated on a digital system using a variety of computer software programs, such as word processing, drawing and animation programs, where the source of the visual image is imperceptible is also straight forward. Therefore, there is a wealth of video images and film images captured and stored as two-dimensional representations as well as infrastructure, including systems and equipment, for such two-dimensional image acquisition.
Regardless of their source, these acquired images may be represented and stored in digital systems as arrays of digital numbers. A digitized image is simply a set of numbers having a direct correspondence with the pixels of the displayed image. For example, a displayed image might consist of 512 by 640 pixels where each pixel is characterized by a range of possible luminous intensities and colors. Film images can also be processed into a matrix of pixels similar to video images.
Processing of digital video images is well known in the art. Traditionally, such prior art digital video processing has been divisible into two major categories. The first prior art category results in a new video image being produced such as through the use of chroma-keying, image compositing and overlaying, rendering, transitions including wipes and fades, and computer generated images including three dimensional computer models and titles. These techniques and their like may be generally categorized as xe2x80x9cvideo generationxe2x80x9d techniques, and result in a new two-dimensional spatial domain representation of a video image.
Contrariwise the second prior art category processes a video image not to generate a new video image, but rather to discern information therefrom, such as in an effort to recognize objects from within the image. Such processing is often used in robotics, for example, in order for feedback to be provided with respect to the operation of a manipulator. Video processing to differentiate between objects appearing in the image or to otherwise discern information contained within the image may be generally categorized as xe2x80x9cmachine visionxe2x80x9d techniques.
It shall be appreciated that application of neither of the above mentioned techniques produces a resulting image having information beyond that available within the supplied images. As described above, the generation technique simply results in a new spatial domain data set from compositing or manipulating input images. Likewise, the machine vision technique simply produces a data set indicative of the position, movement, etc. of an object appearing in an input image.
Additionally, the above mentioned techniques have been isolated in their application. For example, image generation techniques typically generate a resulting image by mechanically applying a desired generation technique to a selected image or data set. Accordingly, chroma-key video generation simply removes an area of an input image having a particular color associated therewith and replaces this area with a second input image. Likewise, computer generated models and titles merely superimpose the computer generated image over an input signal or over a blank screen. As utilized in the prior art, such mechanical applications of image generation techniques would benefit little from the application of machine vision techniques.
Similarly, machine vision techniques are typically applied in order to generate data with respect to a object within an image. As such, the applications utilizing machine vision techniques are generally disinterested in the manipulation of an output image. Therefore, applications of machine vision techniques used in the prior art would not benefit from the application of image generation techniques.
However, it may desired to manipulate an image to produce image information beyond the two-dimensional spatial domain information available by the above described techniques. For example, in stereoscopy, where the sensation of depth obtainable with binocular vision due to small differences in parallax producing slightly differing images to each eye of a viewer, an image more complex than the simple two-dimensional representation is necessary. However, the above mentioned isolated digital processing techniques are each insufficient to extract information from a digital video image in order to properly manipulate the image to produce a resulting stereoscopic image.
One example of a prior art system is shown in International Application No. PCT/AU96/00820 filed Dec. 20, 1996, which illustrates how to create 3D images from 2D input by elongating and/or moving the existing images. This system does not allow for objects to be xe2x80x9creleasedxe2x80x9d from their background and, when viewed, appear to stick out from the screen.
Therefore, there is a need in the art for a system and method for processing images in order to extract information beyond that directly ascertainable from an image representation.
There is a further need in the art for applying information extracted from image representations to manipulate the image in order to produce a resulting image representation including information beyond that directly available from a source.
There is a still further need in the art to utilize information extracted from a two-dimensional spatial domain representation of an image in order to generate an enhanced image providing robust images such as a stereoscopic three-dimensional representation of the input image.
These and other objects, features and technical advantages are achieved by a system and method which utilizes information with respect to objects within a sequential input image, such as is available from the aforementioned machine vision technique, in order to extract, extrapolate and/or interpolate information about the image not directly presented by a two-dimensional spatial domain representation. This information is then utilized with available image manipulation techniques to produce a resulting image having robust attributes not available through the use of image manipulation techniques alone.
In a preferred embodiment of the present invention, source images, such as the aforementioned two-dimensional spatial domain representations, acquired using standard film or video capturing techniques are converted into enhanced images having the illusion of three dimensions. In an alternative embodiment, the resulting video images are in the form of stereoscopic three-dimensional images. Accordingly, the present invention, utilizing machine vision techniques, operates to dissect the source image into object components. Through this process, the present invention is operable to extract or extrapolate information with respect to objects, and their interrelationship, as contained within the image.
Source images may be provided from any number of sources and may include a single image or a series of related images. For example, a series of slightly different images which may be displayed in rapid temporal sequence in order to create the perception of smooth motion as, in the case of television or film image sequences, may be used. Utilizing the aforementioned machine vision techniques, the present invention operates to interpolate spatial domain information with respect to the objects through reference to temporal domain information. For example, where an object in the foreground of the spatial domain representation moves with respect to other objects throughout the temporal domain, spatial domain information for any particular image may be interpolated from information available in other images in the sequence. The interpolated information is used to fill in the information xe2x80x9cmissingxe2x80x9d when each object of the visual image is separated away from the whole.
Thereafter, the present invention operates to produce a new image utilizing the above acquired information. Production of this new image may utilize image manipulation techniques, such as rendering, computer modeling, compositing and overlaying, in order to recombine the objects of the source image into a resulting image incorporating the above acquired information.
For example, the present invention may utilize the acquired information to determine that an image object is in the foreground while another object is in the background. From this information, and utilizing the image generation techniques of rendering and overlaying, the present invention may operate to produce a new image including object shadowing from a heretofore nonexistent light source.
Likewise, the present invention may utilize this information in combination with the image generation technique of overlaying to produce two new images wherein the objects of each image are overlaid with slight variations. By presenting one image to each of a viewer""s two eyes, the sensation of depth is obtained due to small differences in parallax.
It shall therefore be appreciated that a technical advantage of the present invention is its ability to extract information beyond that directly ascertainable from a source image.
A further technical advantage is realized in that the information extracted from the source image may be utilized to manipulate the image in order to produce a resulting image representation including information beyond that directly available from the source image.
A still further technical advantage of the present invention is its ability to produce a stereoscopic three-dimensional image from a source image having only a two-dimensional spatial domain representation. As such, the present invention is uniquely able to generate stereoscopic three-dimensional images from a wealth of pre-existing images.
Likewise, an additional technical advantage is realized in that the present invention is uniquely suitable for producing enhanced images, such as the aforementioned stereoscopic images utilizing common, and typically inexpensive, image capturing equipment.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.