1. Field of the Invention
The present invention relates to image processing in a virtual reality space.
2. Related Background Art
The following methods are known which realize simulation of a virtual world in a virtual reality (techniques of providing human sensory organs with information generated by a computer to allow pseudo-experiences of human activities in an imaginary world or in a remote space).
For example, a three-dimensional (3D) position/direction detector (e.g., FASTRAK of 3SPACE Corporation measures a 3D position and an Eulerian angle in a real space by magnetic conversion techniques) attached to a head of a player who experiences a virtual reality detects geometrical data. In accordance with this data, a computer calculates an image of a previously input model (3D configuration data of an object) while considering its spatial and geometrical position. This calculated image is displayed on a head mount display, e.g., i-glasses of Virtual-io Corporation to make the player experience virtual world simulation.
In such a system realizing a virtual reality, an image to be viewed by a player is generally generated by 3D computer graphics (CG) to be described hereinunder.
In 3D CG for forming an image representing a 3D object, two main operations xe2x80x9cmodelingxe2x80x9d and xe2x80x9crenderingxe2x80x9d are generally performed to form an image.
Modeling is an operation of supplying a computer with data such as a shape, color, surface property and the like of an object to be displayed as an image. For example, if a human image is to be formed, data such as what surface shape of the human image is, what color of which area of the face is, what light reflectivity is, and the like, is generated and stored in the format usable by the next rendering operation. Such a collection of data is called an object model.
For example, in modeling a cubic shape such as shown in FIG. 17, first a modeling coordinate system is formed which has as its origin, one vertex of the cube. Coordinate data of eight vertexes and surface loop data of the cube are determined in the coordinate system, for example, as shown in FIGS. 18A and 18B. A collection of coordinate data and surface loop data is used as model data of the object.
Rendering is an operation of generating an image of an object as viewed from a certain position, after the model is formed. In order to perform rendering, therefore, in addition to the model, conditions of a viewpoint and illumination are required to be considered. The rendering operation is divided into four works including xe2x80x9cprojection conversionxe2x80x9d, xe2x80x9cshielded surface erasingxe2x80x9d, xe2x80x9cshadingxe2x80x9d and xe2x80x9cdevising for realityxe2x80x9d.
With xe2x80x9cprojection conversionxe2x80x9d, the position on the screen of each coordinate value representing a model as viewed from a position of a viewpoint is calculated to convert it into a coordinate value on the screen. FIG. 19 shows four coordinate systems used for the projection conversion. The shape data of an object defined in the modeling coordinate system is first converted into shape data in a world coordinate system (used for the model representing an object). Thereafter, viewing conversion (visual field conversion) is performed to direct a selected camera to one of various directions and take the image of the object. In this case, the data of the object represented in the world coordinate system is converted into the data in a viewpoint coordinate system. For this conversion, a screen (visual field window) is defined in the world coordinate system. This screen is a final projection or picture plane of the object. The coordinate system for defining this screen is called a UVN coordinate system (screen coordinate system). If all objects in front of the viewpoint are drawn, a calculation time may become unnecessarily long and it is therefore necessary in some cases to determine a working area. The working area is called a viewing volume (visual field space). This determination process is called clipping. In the viewing volume, the surface nearest to the camera is called a near or front clipping plane and the surface remotest from the camera is called a far or rear clipping plane. The visual field conversion is performed by moving the screen in one of various directions. After the visual field conversion is performed, a cross point on a picture plane (screen) of a line extending between the viewpoint and each point of the 3D shape of the object in the space is calculated to obtain an image of the object projected upon the screen as shown in FIG. 20. In this case, however, the image is formed through central projection which has a definite distance between the viewpoint and the picture plane. With this projection conversion, therefore, the data in the viewpoint coordinate system is converted into the data in the UVN coordinate system.
Next, the xe2x80x9cshielded surface erasingxe2x80x9d is performed to judge which area of the model can be viewed or cannot be viewed from the present viewpoint. Typical approaches to the shielded surface erasing algorithm are a Z buffer method and a scan line method. After it is determined by the shielded surface erasing which area can be viewed, illumination is taken into consideration to judge which area is viewed in what color and at what brightness. The determined color is drawn on the screen or pixels. This process is the shading work.
The xe2x80x9cdevising of realityxe2x80x9d work is generally performed at the end of rendering. This work Is performed because an image formed by the xe2x80x9cprojection conversionxe2x80x9d, xe2x80x9cshielded surface erasingxe2x80x9d and xe2x80x9cshadingxe2x80x9d becomes much different from a real object and gives no interest to the player. The reason for this is that these processes are performed on the assumption that the surface of an object is an ideal flat plane or a perfectly smooth curve plane capable of being represented by formulas or that the color of each surface is the same over the whole area thereof. One typical method of avoiding this and making an image more realistic, is texture mapping. With this texture mapping, a prepared two-dimensional pattern is pasted (mathematically speaking, an image of the pattern is mapped over) on the surface of an object model in a 3D space. This process aims at making an object constituted of monotonous surfaces be viewed as if it has complicated surfaces. With this process, a simple cubic model can be viewed as a metal object or a stone object.
After the xe2x80x9cprojection conversionxe2x80x9d, xe2x80x9cshielded surface erasingxe2x80x9d, xe2x80x9cshadingxe2x80x9d and xe2x80x9cdevising of realityxe2x80x9d, an image of the object in the UVN coordinate system is finally converted into an image in a device coordinate system which is then displayed on the display device. One rendering process is completed in the above manner. FIG. 21 shows an image (with its background being drawn fully in black) which is an image projected on the screen shown in FIG. 20, converted into the image in the device coordinate system, and displayed on the display screen. The device coordinate system is used when pixels and dots of an image are displayed, and is assumed to be the coordinate system same as that of the display screen (a and b in FIG. 21 represent the numbers of pixels of the display screen).
In forming CG animation by giving a motion to an image (CG image) formed by the method described above, the two methods are mainly used.
With the first method, an object model is placed in a 3D space. Each time the illumination condition, viewpoint condition (position, direction, and angle of view of the viewpoint), the model shape and color and the like are changed slightly, to carry out rendering. After a series of animation images are formed or after each image is rendered, the images are recorded frame by a frame (frame-recorded) in a video tape recorder or the like. After all images are recorded, they are reproduced by a reproducing apparatus. With this method, a time required for image rendering may be prolonged in an allowable range (although depending on a time required for one image rendering and on a time required for forming all animation images). It is therefore possible to form a high quality image, by disposing a number of objects having complicated shapes on the display screen or by incorporating a rendering process, typically ray tracing, requiring a long calculation time. For example, such approaches are used for forming CG images of television advertisements, SF movies and the like.
The second method generates CG animation images by repetitively performing two processes at high speed, the two processes being a rendering process while changing the illumination condition, viewpoint condition, and the object model shape and color and a displaying process of displaying an image formed through the rendering process. This method is generally called real time CG rendering. The main feature of this method is a capability of an interactive process of controlling a motion of CG animation images in real time, by directly reflecting a user instruction upon the rendering. However, practicing this method greatly depends upon the performance of a computer, the amount of data of objects capable of being displayed on the display screen is limited, and only a simple and high speed rendering process can be used. Therefore, as compared to the first method, the quality of images formed by the second method is generally poor. This second method is used with various virtual reality systems, scientific and technical simulations, flight simulators for practicing air plane driving, racing games and fighting gates at game centers, and the like.
Next, a viewpoint detector will be described.
The present applicant filed a so-called viewpoint detector for detecting which area in a display screen of a personal computer or in a view finder screen of a video camera or a still camera a user views. The principle of the viewpoint detector will be described.
FIG. 22 is a plan view illustrating the principle of a viewpoint detecting method, and FIG. 23 is a side view illustrating the principle of the viewpoint detecting method. In FIGS. 22 and 23, 906a and 906b represent a light source such as a light emitting diode (IRED) for emitting infrared rays insensible to a user. The light sources 906a and 906b are disposed approximately symmetrically in the x-direction (horizontal direction) relative to an optical axis of a focussing lens 911, and disposed slightly lower (refer to FIG. 23) in the y-direction (vertical direction). The light sources illuminate an eyeball 908 of the user with diverted light. A fraction of illumination light reflected from the eyeball 908 of the user is focussed on an image sensor 912 by the focussing lens 911.
FIG. 24 is a schematic diagram of an image of an eyeball projected upon the image sensor 912. FIG. 25 is a diagram showing an output intensity of the image sensor 912.
The viewpoint detecting method will be described with reference to FIGS. 22 to 25.
Consider first the horizontal plane. As shown in FIG. 22, light radiated from one light source 906b illuminates the cornea 910 (refer to FIGS. 22 and 23) of the eyeball 908 of a viewer. A cornea reflection image (imaginary image) d (refer to FIGS. 22 and 24) formed by infrared rays reflected by the surface of the cornea 910 is converged by the focussing lens 911 and focussed at a position dxe2x80x2 (refer to FIG. 22) of the image sensor 912. Similarly, light radiated from the other light source 906a illuminates the cornea 910 (refer to FIGS. 22 and 23) of the eyeball 908 of the viewer. A cornea reflection image (imaginary image) e (refer to FIGS. 22 and 24) formed by infrared rays reflected by the surface of the cornea 910 is converged by the focussing lens 911 and focussed at a position exe2x80x2 (refer to FIG. 22) of the image sensor 912. Light fluxes reflected from the ends a and b (refer to FIGS. 22 to 24) of the iris 904 are focussed via the focussing lens 911 at positions axe2x80x2 and bxe2x80x2 (refer to FIGS. 22 and 24) of the image sensor 912 to form the images of the ends a and b. If the rotation angle xcex8 of the optical axis of the eyeball 908 relative to the optical axis of the focussing lens 911 is small, a number of x-coordinate values xa and xb of the ends a and b of the iris 904 can be obtained on the image sensor 912 (x symbols in FIG. 24). The iris center xc is calculated by the least square method applied to a circle. The rotation angle xcex8x relative to the optical axis of the eyeball 908 is given by:
ocxc3x97sin xcex8x=xcxe2x88x92xoxe2x80x83xe2x80x83(1)
where xo is the x-coordinate value of the center o of the radius of curvature of the cornea 910.
If a predetermined correction value xcex4x is applied to the middle point k between the cornea reflection images d and e, the iris center is given by:
xk=(xd+xe)/2
xo=(xd+xe)/2+xcex4xxe2x80x83xe2x80x83(2)
The correction value xcex4x is determined geometrically by using the apparatus mount conditions, a distance to the eyeball and the like, the calculation method being omitted.
The equation (1) is substituted into the equation (2). Then, xcex8x is given by:
xcex8x=arc sin[[xc{(xd+xe)/2+xcex4x}]/oc]xe2x80x83xe2x80x83(3)
The coordinate value of each feature point projected on the image sensor 912 is affixed with xe2x80x2 (dash) to obtain:
xcex8x=arc sin[[xcxe2x80x2{(xdxe2x80x2+xexe2x80x2)/2+xcex4xxe2x80x2}]/oc/xcex2]xe2x80x83xe2x80x83(4)
where xcex2 is a magnification factor determined from a distance sze between the focussing lens 911 and eyeball 908, which is obtained in actual as the function of a distance |xdxe2x80x2xe2x88x92xexe2x80x2| between the cornea reflection images d and e.
Next, consider the vertical plane shown in FIG. 23. The cornea reflection images d and e formed by the two light sources 906a and 906b are focussed at the same position which image is represented by i. A method of calculating the rotation angle xcex8y of the eyeball 908 in the vertical direction is generally the same as the horizontal plane, excepting that the equation (2) becomes different as:
yo=yi+xcex4yxe2x80x83xe2x80x83(5)
where yo is the y-coordinate of the center o of the radius of curvature of the cornea. The correction value xcex4y is determined geometrically by using the apparatus mount conditions, a distance to the eyeball and the like, the calculation method being omitted.
Therefore, the rotation angle xcex8y of the eyeball 908 in the vertical direction is given by:
xcex8y=arc sin[[ycxe2x80x2xe2x88x92(yixe2x80x2+xcex4yxe2x80x2)]/oc/xcex2]xe2x80x83xe2x80x83(6)
The position coordinates (xn, yn) on the screen such as a view finder on the horizontal and vertical planes are given by:
xe2x80x83xn=maxc3x97arc sin[[xcxe2x80x2xe2x88x92{(xdxe2x80x2+xexe2x80x2)/2+xcex4xxe2x80x2}]/oc/xcex2]xe2x80x83xe2x80x83(7)
yn=maxc3x97arc sin[[ycxe2x80x2xe2x88x92(yixe2x80x2+xcex4yxe2x80x2)]/oc/xcex2]xe2x80x83xe2x80x83(8)
where m is a constant determined by the view finder optical system.
As seen from FIGS. 24 and 25, in detecting the iris edges, a rise edge (xbxe2x80x2) and a fall edge (xaxe2x80x2) of an output waveform of the image sensor 912 are used. In detecting the coordinate values of the cornea reflection images d and e, a sharp rise edge (xexe2x80x2) and a sharp fall edge (xdxe2x80x2) are used.
Next, an example of a personal computer system with a viewpoint detecting function will be described.
FIG. 26 is a schematic diagram showing an example of the structure of a personal computer system having a viewpoint detecting function. The personal computer system shown in FIG. 26 is constituted of a personal computer unit 1008, a head mount display 1006 used for a user to view the screen of the personal computer unit 1008, and an external monitor 1009 used for the user or other persons to view the screen of the personal computer unit 1008. The head mount display 1006 is fixed to the position near the eyes of the user, by goggles, an eye glass frame or the like.
The head mount display 1006 is constituted of: a display element 1002 such as an liquid crystal display (LCD); a specific prism 1003 for realizing a magnified observation system; a viewpoint detection circuit 1064 for detecting a viewpoint of the eye 1005 of a viewer; a display circuit 1007 for displaying a personal computer screen on the display element 1002; infrared light emitting diodes 1060 and 1061 for radiating infrared toward the eye 1005 of the viewer; focussing lenses 1062a and 1062b for focussing infrared rays; and a photoelectric conversion element (image sensor) 1063 for converting the infrared rays focussed by the focussing lenses 1062a and 1062b into an electric signal. The viewpoint detection circuit 1064 detects a subject point or viewpoint of the viewer on the display element in accordance with the image of the eye 1005 on the photoelectric conversion element 1063.
An optical operation of the observation system of the head mount display 1006 will be described. Light from the display element 1002 is refracted by a third optical action plane c and transmitted. The transmitted light is then totally reflected by a first optical action plane a, and reflected by a second optical action plane b. Thereafter, the light is again refracted by the first optical action plane a and transmitted to have an expansion angle (converging angle, parallel light fluxes) suitable for the dioptic power of the observer to be incident upon the eye 1005 side of the observer. A line coupling the eye 1005 of the observer and the center of the display element 1002 is used as a reference optical axis. The dioptic power of the observer can be adjusted by moving the display element 1002 in parallel to the optical axis of the prism 1003. In order to realize a telecentric optical system by correcting image characteristic and distortion, the three optical action planes of the prism 1003 are preferably configured by 3D curve planes having no rotation symmetry axes. In this example, the curve planes are symmetrical only to a plane parallel to the drawing sheet including the reference optical axis.
The optical operation of the viewpoint detecting system of the head mount display 1006 will be described next. Light radiated from the infrared light emitting diodes 1060 (two pieces in the depth direction) for bare eyes and infrared light emitting diodes 1061 (two pieces in the depth direction) for eyes with eye glasses illuminates the viewer eye 1005 along a direction different from the optical axis of the viewpoint detecting system, via openings 1012, 1013, 1014 and 1015 formed in the second optical action plane b. The illumination light is reflected and scattered by the cornea 910 and iris of the viewer. The light reflected by the cornea 910 forms the cornea reflection images d and e, whereas the light scattered by the iris forms the iris image. The light is also focussed on the image sensor 1063 by the focussing lenses 1062a and 1062b via an opening 1010 formed in the second optical action plane b. From an image of the eye 1005 of the viewer obtained by the image sensor 1063, the feature point data can be derived by the viewpoint detection circuit 1064 which is configured to perform the viewpoint detecting principle described previously.
The focussing lens system is configured by the two focussing lenses 1062a and 1062b. The focussing lens 1062b in particular is a wedge-shape lens which allows the focussing lens system to be configured by a less number of lenses and is suitable for a compact lens system. By providing a slanted plane of the focussing lens 1062b with a radius of curvature, eccentric aberration generated at the second optical action plane b can be effectively corrected. If the focussing lens system is provided with at least one plane not curved, it is effective for correcting the focussing performance outside of the optical axis. If an aperture of the focussing lens system is disposed near at the opening formed in the second optical action plane b, the opening 1010 can be made narrow so that an inside missing of the observation system can be prevented effectively. The opening and the aperture are preferably made coincident. If the opening is set smaller than 2 mm, this opening becomes smaller than the iris of the eye 1005 of the viewer so that the inside missing of the observation system can be prevented more effectively. Light for illuminating the eye 1005 of the viewer is infrared light which has a low luminous sensitivity. If the focussing lens system is provided with at least one lens for cutting visual light, a viewpoint detection precision can be improved.
FIG. 27 is a diagram showing a side view of the prism 1003. Although the second optical action plane b is provided with a reflection mirror coating, this coating is not formed at the openings of the focussing lens 1062a and 1062b and infrared light emitting diodes 1060 and 1061 (opening 1010 for focussing, openings 1012 and 1013 for infrared emitting diodes for bare eyes, openings 1014 and 1015 for infrared light emitting diodes for eyes with eye glasses). As described earlier, these openings 1010, 1012 to 1015 are so small that the view finder optical system is not affected, and the size is preferably 2 mm or smaller.
The openings 1010, 1012 to 1015 are formed in the mirror coating area, and the infrared light emitting diodes 1060 and 1061 as the light illumination sources are disposed on the side opposite to the eye 1005 of the viewer. Therefore, even If the prism 1003 has a high reflectivity to realize a broad visual field, the eye 1005 of the viewer can be properly illuminated at the height level approximate to the eye position.
The infrared light emitting diodes 1060 and 1061 are disposed at different positions for discriminating between bare eyes and eyes with eye glasses. The two infrared light emitting diodes 1060 for bare eyes are disposed right and left symmetrically to the optical axis in a narrow width at the same height slightly lower than the optical axis. On the other hand, the two infrared light emitting diodes 1061 for eyes with eye glasses are disposed right and left symmetrically to the optical axis in a broad width at the same height fairly lower than the optical axis. There are three reasons for this layout. One reason is to illuminate the eye detection area as uniformly as possible in order to ensure the good illumination conditions irrespective of a distance to the eyeball. The second reason is to set the infrared light emitting diodes 1060 for bare eyes higher than the diodes 1061 for eyes with eye glasses in order not to make the cornea reflection images d and e be intercepted by eye lids. The third reason is to set the infrared light emitting diodes 1061 for eyes with eye glasses more spaced in the right and left directions and the height level than the diodes 1060 in order to direct ghost images of the infrared rays reflected by the eye glasses to the peripheral area having less influence upon the viewpoint detection. Discrimination between an eyeball and an eye glass is conducted through calculation of a distance between the eyeball and prism 1003 by using the distance |xdxe2x80x2xe2x88x92xexe2x80x2| between the cornea reflection images d and e.
The viewpoint detection circuit 1064 detects a viewpoint of the viewer on the display element 1002 from an image of the eye 1005 of the viewer on the image sensor element, in accordance with the above-described viewpoint detecting principle.
Next, the personal computer unit will be described.
In FIG. 26, reference numeral 1008 represents the personal computer unit. Reference numeral 1814 represents a central processing unit (CPU) which processes programs and data. Reference numeral 1813 represents a system bus interconnecting system devices. Reference numeral 1818 represents a memory controller for controlling a read-only memory (ROM) 1816 and a random access memory (RAM) 1817. Reference numeral 1812 represents a video graphic controller for controlling to display the contents written in a video RAM 1811 on the display. Reference numeral 1815 represents an accessory device controller for controlling a pointing device or a keyboard. In this example, the accessory device controller 1815 is connected to the viewpoint detection circuit 1064 of the head mount display 1006. Reference numeral 1819 represents an I/O channel for peripheral device control. In this example, the I/O channel 1819 is connected to the display circuit 1007 of the head mount display 1006.
In the personal computer unit constructed as above, viewpoint information of an operator detected with the viewpoint detection circuit 1064 of the head mount display 1006 can be used to scroll the screen or select a menu, in the similar manner as the information from the pointing device of the personal computer unit 1008 is used. Since the image on the screen of the personal computer unit can be displayed also on the external monitor 1009, persons other than the operator can see the image on the screen of the personal computer unit. If a single eye head mount display is used, the operator can see the image on the external monitor 1009.
In the above example, the screen (picture plane) set in the virtual space is a rectangular plane fixed relative to the viewpoint. An image is calculated from the model data of an object through image mapping of one point central projection over the screen. A spatial and geometrical shape of an object viewed by an operator is therefore simulations of a real world.
FIG. 8A illustrates the relation between a viewpoint, a screen, and three objects laterally disposed in line in the above example. For the simplicity of the drawing, the viewpoint is set just above the objects. The rendered image is shown in FIG. 8B. The images viewed by a viewer may have no artistic expression or less entertainment.
When an viewer wishes to emphasize a particular object, the color or size of the object is changed with the pointing device such as a mouse. In this case, the operator is required to move the mouse so that an intention of the operator cannot be reflected immediately.
The present invention has been made to solve the above problems. It is a first object of the invention to provide a virtual reality system and method capable of experiencing simulations of a virtual world having high artistic expressions and high entertainment.
It is a second object of the present invention to provide an image processing system capable of efficiently realizing a highly reliable and sophisticated process.
It is a third object of the present invention to provide a storage medium capable of smoothly controlling the virtual reality system as above.
In order to achieve the above objects, a preferred embodiment of the invention discloses an image processing method comprising: a modeling step of configuring three-dimensional shape data of an object; a viewpoint position detecting step of detecting a viewpoint position of a viewer intending to experience virtual reality; a viewpoint setting step of setting a viewpoint in a three-dimensional space; a screen setting step of setting a screen in a virtual space in accordance with viewpoint position data detected at the viewpoint position detecting step; a screen mapping step of mapping a scene over the screen, which scene is formed by model data of the object viewed at the viewpoint set at the viewpoint setting step while a spatial and geometrical position of the object is taken into consideration; an image generating step of mapping the scene mapped on the screen at the screen mapping step, over a device coordinate system; a video converting step of converting an image generated at the image generating step into a video signal; and a video display step of displaying an image converted at the video converting step.
In order also to achieve the above objects, a preferred embodiment of the invention discloses an image processing system comprising: modeling means for configuring three-dimensional shape data of an object; viewpoint position detecting means for detecting a viewpoint position of a viewer intending to experience virtual reality; viewpoint setting means for setting a viewpoint in a three-dimensional space; screen setting means for setting a screen in a virtual space in accordance with viewpoint position data detected by the viewpoint position detecting means; screen mapping means for mapping a scene over the screen, which scene is formed by model data of the object viewed at the viewpoint set by the viewpoint setting means while a spatial and geometrical position of the object is taken into consideration; image generating means for mapping the scene mapped on the screen by the screen mapping means, over a device coordinate system; video converting means for converting an image generated by the image generating means into a video signal; and video display means for displaying an image converted by the video converting means.
In order also to achieve the above objects, a preferred embodiment of the invention discloses a storage medium storing a program for controlling a virtual reality system realizing a virtual reality, the program comprising: a modeling module for configuring three-dimensional shape data of an object; a viewpoint position detecting module for detecting a viewpoint position of a viewer intending to experience virtual reality; a viewpoint setting module for setting a viewpoint in a three-dimensional space; a screen setting module for setting a screen in a virtual space in accordance with viewpoint position data detected by the viewpoint position detecting module; a screen mapping module for mapping a scene over the screen, which scene is formed by model data of the object viewed at the viewpoint set by the viewpoint setting module while a spatial and geometrical position of the object is taken into consideration; an image generating module for mapping the scene mapped on the screen by the screen mapping module, over a device coordinate system; a video converting module for converting an image generated by the image generating module into a video signal; and a video display module for displaying an image converted by the video converting module.
The other objects and features of the invention will become apparent from the following detailed description of the embodiments when read in conjunction with the accompanying drawings.