This invention relates to a method and apparatus for enabling interactive communication with objects on a video display in which the operator is able to manipulate images of natural objects by combining them, rotating, zooming in or otherwise acting as though the objects were being physically manipulated in three-dimensional space.
The use of interactive video material is becoming very widespread. From simple computer games to proposed digital television systems, the ability for the viewer or user to interact with objects on the screen is increasingly important. While there are many devices which allow a viewer to move a cursor to different parts of the screen and select objects, in most cases the interactive relationship takes place in two dimensions and the objects being manipulated are computer-generated rather than natural photographic or video quality objects.
The process of three-dimensional computer rendering is well established. This process uses computer simulation to create three-dimensional views of computer models. Computer simulation is accomplished through fairly sophisticated software techniques such as ray tracing and texture mapping and require very expensive computer equipment to render multiple images sequentially to give the illusion of real-time interaction.
An object of this invention is to provide a method and apparatus to permit displaying photo-realistic three-dimensionally projected views of real objects in real scenes and enabling the viewer to manipulate these objects and the scene with several degrees of freedom, such as rotation, zooming, or otherwise xe2x80x9chandlingxe2x80x9d them as though they were physically manipulated.
Another object of this invention is to provide such as a system which allows for 360xc2x0 rotation including spherical rotation and which enables the viewer to focus on any aspect of the displayed object quickly, accurately and easily.
Another object of this invention is to provide the ability to separate the acquired object from its background and provide the user with the ability to combine multiple objects over different backgrounds, including motion video backgrounds, and to give the user independent control over the placement and rotation of these objects.
Another object of this invention is to allow the user to similarly manipulate multiple moving objects such as people talking.
Another object of this invention is to allow the user to interact with views of three-dimensional objects, either natural or computer-generated, on a low-cost device such as an off-the-shelf PC.
Other objects, advantages and features of this invention will become more apparent from the following description.
The above objects are accomplished by providing a method and apparatus to acquire multiple views of an object to satisfy all degrees of freedom required by the user. This apparatus may consist of a camera set up on a track around the object. Alternatively, the camera may be stationary and the object may be moved such as being placed on a turntable. In any case, views all around the object are obtained. The views from all the required points are digitized with the objects placed or considered being against a blue background to allow the separation of the objects from the background. The digital images are then compressed using a suitable image compression technology that allows storage of pixel maps of irregular shapes and multiple transparency levels (such as an alpha-channel). Each digital image is then databased along with the information about the position of the camera in relation to the object. Enough images are acquired to allow the required degrees of freedom, such as rotation or zooming.
For viewing of the acquired images, the digital images along with the position information are called in the right sequence to the decompression device and placed into a digital frame buffer to be displayed on a video monitor. Multiple objects may be decompressed and using the transparency information from the data, composited over a background. The background itself may also be a decompressed still image or motion video. The user interacts with the digital images via any available input device, such as a mouse, keyboard or a remote control unit. A suitable control unit, such as a computer, interprets the user requests and selects correct digital images to be decompressed and correctly positioned in the frame buffer, providing the user with the illusion of interacting directly with the stored digital images.
Via the input device, the viewer is able to:
(a) select any of the objects on the screen for further manipulation;
(b) move any of the objects available and arbitrarily place them over any available backgrounds, including motion video backgrounds;
(c) rotate the selected object and see it from any angle that has previously been acquired;
(d) zoom in on any portion of the selected object or the entire scene consisting of multiple objects and the background;
(e) to otherwise manipulate such objects with any degree of freedom provided at the time of acquisition of these objects.
Alternate viewing systems may be designed, constraining the ability to manipulate acquired objects to specific times or degrees of freedom and allowing the system to respond to the user requests in ways other than displaying the video images. These viewing systems may allow some of the object to be manipulated by the user while directly controlling other visible objects, thus creating a xe2x80x9cvirtual spacexe2x80x9d for the user. This allows for the creation of interactive training systems capable of providing the user with information about the available object in video or text form, and the creation of games where some of the objects seem to have xe2x80x9cintelligencexe2x80x9d and act without user""s intervention.
In a simple example, a person at home will be able to purchase a sweater or a dress by looking at images of a model wearing it and being able to see it from all angles. Additionally, if one wanted to study the stitching or other fine details, the interactive display will permit zooming in on the displayed image so that significant detail can also be observed. The viewer can then choose to see the model wearing the selected item in different settings, such as a cocktail party or the beach. The viewer can select from models of different heights and hair colors, for instance, to see how the item looks on different people or even preselect a self image onto which the clothing can be placed.
In a training application, a student may be able to observe a car engine in operation. The student can look at the engine from any angle and zoom in to see different parts of it in motion or slow down or stop the motion of the engine to get a clearer picture. The student may choose an extra degree of freedom provided for by the creator of the application to xe2x80x9cexplodexe2x80x9d the engine and see the parts separate while the motion is going on. Alternatively, the student may be able to crank the engine xe2x80x9cby handxe2x80x9d to see how the different moving parts inter-relate. The student may select any of the parts of the engine and examine them individually and possibly get textual or graphic information about that part. In this example, the original views of the engine may be acquired via a camera or prerendered using traditional three-dimensional rendering techniques.
In a video game example, the user may see a xe2x80x98virtualxe2x80x99 world and interact with objects and characters that exist in it. In a murder mystery setting, for instance, the player may examine objects in a room by picking them up, move from place to place, and interview the characters. The characters may be actual people, views of which have been acquired from different angles while they talk. The player can xe2x80x98walkxe2x80x99 around these characters as they talk and walk around themselves. Player may assume role of a chair and guide it around.
In another aspect of the invention, the acquisition mechanism removes the background of the object and compresses the images in real time. The information is then transmitted via a communication line such as a high-bandwidth telephone line to another location for decompression. Several streams of information may be fed to multiple locations, and the viewing device can combine objects from different streams for display on a single video monitor. This arrangement allows for a new method of video conferencing, where each participant may place other participants into the same digital setting, such as around a xe2x80x9cvirtualxe2x80x9d conference table on the screen. The participants may then zoom around the table to concentrate on the speaker, or zoom in on other participants to view their reactions. Each of the participants may have their own favorite setting without affecting the others, choosing from any available pre-stored setting and create the illusion of the meeting taking place virtually anywhere in the world.