In recent years, studies about mixed reality that aims at seamless joint of real and virtual spaces have been extensively made. An image display apparatus which presents mixed reality is implemented by superimposing an image of a virtual space (e.g., a virtual object, text information, and the like rendered by computer graphics) onto an image of a real space photographed by an image sensing device such as a video camera or the like.
As applications of such image display apparatus, new fields different from conventional virtual reality such as operation assistance that superimposes the state in a body onto the body surface of a patient, a mixed reality game in which a player fights against virtual enemies that swim on the real space, and the like are expected.
A common requirement for these applications involves the precision level of alignment between the real and virtual spaces, and many efforts have been conventionally made in this respect.
A problem of alignment in mixed reality amounts to obtaining the three-dimensional (3D) position and posture of an image sensing device on a world coordinate system set on the real space (to be simply referred to as a world coordinate system hereinafter). As a method of solving these problems, it is a common practice to use a 3D position/posture sensor such as a magnetic sensor, ultrasonic wave sensor, and the like.
In general, the output value of a 3D position/posture sensor indicates the position and posture of a measurement point on a sensor coordinate system which is uniquely defined by the sensor, but is not that of the image sensing device on the world coordinate system. Taking the Polhemus FASTRAK (magnetic sensor) as an example, the position and posture of a receiver on a coordinate system defined by a transmitter are obtained as the sensor output. Therefore, the sensor output value cannot be directly used as the position and posture of the image sensing device on the world coordinate system, and must undergo some calibration processes. More specifically, coordinate transformation that transforms the position and posture of a measurement point into those of the image sensing device, and coordinate transformation that transforms the position and posture on the sensor coordinate system into those on the world coordinate system are required. In this specification, information used to transform the sensor output value into the position and posture of the image sensing device on the world coordinate system will be referred to as calibration information.
FIG. 1 is a block diagram showing the functional arrangement of a general image display apparatus which presents mixed reality.
A display screen 110 and video camera 120 are fixed to a head-mount unit 100. When the user (not shown) wears the head-mount unit 100 so that the display screen 110 is located in front of the user's eye, a scene in front of the user's eye is captured by the video camera 120. Therefore, if the image captured by the video camera 120 is displayed on the display screen 110, the user observes a scene in front of the eye, which the user may observe by the naked eye if he or she does not wear the head-mount unit 100, via the video camera 120 and display screen 110.
A position/posture sensor 130 is a device for measuring the position and posture of a measurement point, fixed to the head-mount unit 100, on the sensor coordinate system, and comprises, e.g., the Polhemus FASTRAK as a magnetic sensor including a receiver 131, transmitter 133, and sensor controller 132. The receiver 131 is fixed to the head-mount unit 100 as a measurement point, and the sensor controller 132 measures and outputs the position and posture of the receiver 131 on the sensor coordinate system with reference to the position and posture of the transmitter 133.
On the other hand, an arithmetic processing unit 170 comprises a position/posture information transformer 140, memory 150, and image generator 160, and can be implemented by, e.g., a single versatile computer. The position/posture information transformer 140 transforms a measurement value input from the position/posture sensor 130 in accordance with calibration information held by the memory 150 so as to calculate the position and posture of the video camera 120 on the world coordinate system, and outputs the calculated position and posture as position/posture information. The image generator 160 generates a virtual image in accordance with the position/posture information input from the position/posture information transformer 140, superimposes that virtual image on an actual image captured by the video camera 120, and outputs superimposed that. The display screen 110 receives an image from the image generator 160, and displays it. With the above arrangement, the user (not shown) can experience as if a virtual object were present on the real space in front of the user's eye.
A method of calculating the position and posture of the video camera on the world coordinate system by the position/posture information transformer 140 will be described below using FIG. 2. FIG. 2 is a view for explaining the method of calculating the position and posture of the video camera on the world coordinate system.
In FIG. 2, let MTW be the position and posture of a sensor coordinate system 210 (a coordinate system having the position of the transmitter 133 as an origin) on a world coordinate system 200, MST be the position and posture of the measurement point (i.e., the receiver 131) of the position/posture sensor 130 on the sensor coordinate system 210, MCS be the position and posture of the video camera 120 viewed from the measurement point of the position/posture sensor 130, and MCW be the position and posture of the video camera 120 on the world coordinate system 200. In this specification, the position and posture of object B on coordinate system A are expressed by a viewing transformation matrix MBA (4×4) from coordinate system A to coordinate system B (local coordinate system with reference to object B).
At this time, MCW can be given by:MCW=MCS·MST·MTW  (A)
In equation (A), MST is the input from the position/posture sensor 130 to the position/posture information transformer 140, MCW is the output from the position/posture information transformer 140 to the image generator 160, and MCS and MTW correspond to calibration information required to transform MST into MCW. The position/posture information transformer 140 calculates MCW based on equation (A) using MST input from the position/posture sensor 130, and MCS and MTW held in the memory 150, and outputs it to the image generator 160.
In order to attain accurate alignment between the real and virtual spaces, accurate calibration information must be set in the memory 150 by some means. A virtual image which is accurately aligned in the real space can be displayed only when the accurate calibration information is given.
Note that the holding form of the calibration information in the memory 150 is not limited to the viewing transformation matrix, and any other forms may be adopted as long as information can define the position and posture of one coordinate system viewed from the other coordinate system. For example, the position and posture may be expressed by a total of six parameters, i.e., three parameters that describe the position, and three parameters which express the posture using an Euler angle. Also, the posture may be expressed by four parameters, i.e., a three-valued vector that defines the rotation axis, and a rotation angle about that axis, or may be expressed by three parameters that express the rotation angle by the magnitude of the vector which defines the rotation axis.
Furthermore, the position and posture may be expressed by parameters which represent their inverse transformations (e.g., the position and posture of the world coordinate system 220 on the sensor coordinate system 210). In any of these cases, the position and posture of an object on a 3D space have only six degrees of freedom (three degrees of freedom for the position, and three degrees of freedom for the posture). Hence, unknown parameters required for calibration for this image display apparatus are a total of 12 parameters, i.e., six parameters required for transformation from the world coordinate system to the sensor coordinate system, and six parameters required for transformation from the position and posture of the measurement point to those of the video camera.
As one of known methods for setting the calibration information, the user or operator interactively changes 12 parameters (or 12 or more equivalent parameters) used to define MCS and MTW stored in the memory 150 via an input means (not shown), and makes adjustment by trial and error until accurate alignment is achieved.
Also, according to a calibration method proposed by Japanese Patent Application No. 2001-050990 (US AA 2002-95265), if one of MCS and MTW is obtained by some method, the remaining unknown parameters can be easily derived using a virtual image generated based on position/posture information fixed to a given value as a visual queue.
However, in the former method, since 12 unknown parameters must be adjusted at the same time, adjustment takes much time, and accurate calibration information cannot always be obtained. In the latter method, trial & error operations or operations using some calibration tool must be done upon deriving a parameter as a known one. Hence, these methods still have room for improvement.
The present invention has been made in consideration of the aforementioned problems, and has as its object to easily acquire calibration information required to transform the position and posture of an image sensing device measured by a sensor into those on a world coordinate system without using any special calibration tool.