The present invention relates to a mixed reality presentation apparatus for presenting to a user or operator mixed reality which couples a virtual image generated by computer graphics to the real space. The present invention also relates to an improvement of precise detection of, e.g., head position and/or posture of an operator to which mixed reality is presented.
In recent years, extensive studies have been made about mixed reality (to be abbreviated as xe2x80x9cMRxe2x80x9d hereinafter) directed to seamless coupling of a real space and virtual space. MR has earned widespread appeal as a technique for enhancing virtual reality (to be abbreviated as xe2x80x9cVRxe2x80x9d hereinafter) for the purpose of coexistence of the real space and the VR world that can be experienced in only a situation isolated from the real space.
Applications of MR are expected in new fields qualitatively different from VR used so far, such as a medical assistant use for presenting the state of the patient""s body to a doctor as if it were seen through, a work assistant use for displaying the assembling steps of a product on actual parts in a factory, and the like.
These applications commonly require a technique of removing xe2x80x9cdeviationsxe2x80x9d between a real space and virtual space. The xe2x80x9cdeviationsxe2x80x9d can be classified into a positional deviation, time deviation, and qualitative deviation. Many attempts have been made to remove the positional deviation (i.e., alignment) as the most fundamental requirement among the above deviations.
In case of video-see-through type MR that superposes a virtual object on an image sensed by a video camera, the alignment problem reduces to accurate determination of the three-dimensional position of that video camera.
The alignment problem in case of optical-see-through type MR using a transparent HMD (Head Mount Display) amounts to determination of the three-dimensional position of the user""s view point. As a method of measuring such position, a three-dimensional position-azimuth sensor such as a magnetic sensor, ultrasonic wave sensor, gyro, or the like is normally used. However, the precision of such sensors is not sufficient, and their errors produce positional deviations.
On the other hand, in the video-see-through system, a method of direct alignment on an image on the basis of image information without using such sensors may be used. With this method, since positional deviation can be directly processed, alignment can be precisely attained. However, this method suffers other problems, i.e., non-real-time processing, and poor reliability.
In recent years, attempts for realizing precise alignment by using both a position-azimuth sensor and image information since they compensate for each other""s shortcomings have been reported.
As one attempt, xe2x80x9cDynamic Registration Correction in Video-Based-Augmented Reality Systemsxe2x80x9d (Bajura Michael and Ulrish Neuman, IEEE computer Graphics and Applications 15, 5, pp. 52-60, 1995) (to be referred to a first reference hereinafter) has proposed a method of correcting a positional deviation arising from magnetic sensor errors using image information in video-see-through MR.
Also, xe2x80x9cSuperior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Trackingxe2x80x9d (State Andrei et al., Proc. of SIGGRAPH 96, pp. 429-438, 1996) (to be referred to as a second reference hereinafter) has proposed a method which further develops the above method, and compensates for ambiguity of position estimation based on image information. The second reference sets a landmark, the three-dimensional position of which is known, in a real space so as to remove any position deviation on an image caused by sensor errors when a video-see-through MR presentation system is built using only a position-azimuth sensor. This landmark serves as a yardstick for detecting the positional deviation from image information.
If the output from the position-azimuth sensor does not include any errors, a coordinate point (denoted as Ql) of the landmark actually observed on the image must agree with a predicted observation coordinate point (denoted as Pi) of the landmark, which is calculated from the camera position obtained based on the sensor output, and the three-dimensional position of the landmark.
However, in practice, since the camera position obtained based on the sensor output is not accurate, Q1 and P1 do not agree with each other. The deviation between the predicted observation coordinate Q1 and land mark coordinate P1 represents the positional deviation between the landmark positions in the virtual and real spaces and, hence, the direction and magnitude of the deviation can be calculated by extracting the landmark position from the image.
In this way, by qualitatively measuring the positional deviation on the image, the camera position can be corrected to remove the positional deviation.
The simplest alignment method using both a position-azimuth sensor and image is correction of sensor errors using one point of landmark, and the first reference proposed a method of translating or rotating the camera position in accordance with the positional deviation of the landmark on the image.
FIG. 1 shows the basic concept of positional deviation correction using one point of landmark. In the following description, assume that the internal parameters of a camera are known, and an image is sensed by an ideal image sensing system free from any influences of distortion and the like.
Let C be the view point position of the camera, QI be the observation coordinate position of a landmark on an image, and QC be the landmark position in a real space. Then, the point QI is present on a line lQ that connects the points C and QC. On the other hand, from the camera position given by the position-azimuth sensor, a landmark position PC on the camera coordinate system, and its observation coordinate position PI on the image can be estimated. In the following description, v1 and v2 respectively represent three-dimensional vectors from the point C to the points QI and PI. In this method, positional deviation is corrected by modifying relative positional information between the camera and object so that a corrected predicted observation coordinate position Pxe2x80x2I of the landmark agrees with QI (i.e., a corrected predicted landmark position Pxe2x80x2C on the camera coordinate system is present on the line lQ).
A case will be examined below wherein the positional deviation of the landmark is corrected by rotating the camera position. This correction can be realized by modifying the position information of the camera so that the camera rotates an angle q that the two vectors v1 and v2 make with each other. In actual calculations, vectors v1n and v2n obtained by normalizing the above vectors v1 and v2 are used, their outer product v1nxc3x97v2n is used as the rotation axis, their inner product v1nxc2x7v2n is used as the rotation angle, and the camera is rotated about the point C.
A case will be examined below wherein the positional deviation of the landmark is corrected by relatively translating the camera position. This correction can be realized by translating the object position in the virtual world by v=n(v1xe2x88x92v2). Note that n is a scale factor defined by:                     n        ⁢                  xe2x80x83                =                  xe2x80x83                ⁢                              "LeftBracketingBar"                          CP              C                        "RightBracketingBar"                                "LeftBracketingBar"                          CP              I                        "RightBracketingBar"                                              (        1        )            
Note that |AB| is a symbol representing the distance between points A and B. Likewise, correction can be attained by modifying the position information of the camera so that the camera translates by xe2x88x92v. This is because this manipulation is equivalent to relative movement of a virtual object by v.
The above-mentioned two methods two-dimensionally adjust the positional deviation on the landmark but cannot correct the camera position to a three-dimensionally correct position. However, when sensor errors are small, these methods can expect sufficient effects, and the calculation cost required for correction is very small. Hence, these methods are excellent in real-time processing.
However, the above references do not consider any collaborative operations of a plurality of operators, and can only provide a mixed reality presentation system by a sole operator.
Since the methods described in the references need to detect a coordinate of the only land mark within the sensed image, thus, have limitations that a specific marker as a mark for alignment must always be sensed by the camera, they allow observation within only a limited range.
The above limitation derived from using the single land mark is fatal to construction of mixed reality space shared by a plurality of users or operators.
The present invention has been made in consideration of the conventional problems, and has as its object to provide an apparatus that presents a collaborative operation of a plurality of operators by mixed reality In order to achieve the above object, according to the present invention, a mixed reality presentation apparatus which generates a three-dimensional virtual image associated with a collaborative operation to be done by a plurality of operators in a predetermined mixed reality environment, and displays the generated virtual image on see-through display devices respectively attached to the plurality of operators, comprises:
first sensor means for detecting a position of each of actuators which are operated by the plurality of operators and move as the collaborative operation progresses;
second sensor means for detecting a view point position of each of the plurality of operators in an environment of the collaborative operation; and
generation means for generating three-dimensional images for the see-through display devices of the individual operators, the generation means generating a three-dimensional virtual image representing an operation result of the collaborative operation that has progressed according to a change in position of each of the plurality of actuators detected by the first sensor means when viewed from the view point position of each operator detected by the second sensor means, and outputting the generated three-dimensional virtual image to each see-through display device.
Since the first sensor means of the present invention detects the positions of the individual actuators operated by the operators, the positional relationship between the actuators of the operators can be systematically recognized, and mixed reality based on their collaborative operation can be presented without any positional deviation.
In order to track the collaborative operation by all the operators, a camera which covers substantially all the operators within its field of view is preferably used. Hence, according to a preferred aspect of the present invention, the first sensor means comprises:
an image sensing camera which includes a maximum range of the actuator within a field of view thereof, the position of the actuator moving upon operation of the operator; and
image processing means for detecting the position of the actuator by image processing from an image obtained by the camera.
In order to present mixed reality based on the collaborative operation, detection of some operations of the operators suffices. For this reason, according to a preferred aspect of the present invention, when the first sensor means uses a camera, the actuator outputs light having a predetermined wavelength, and the first sensor means comprises a camera which is sensitive to the light having the predetermined wavelength.
According to a preferred aspect of the present invention, the actuator is a mallet operated by a hand of the operator. The mallet can be easily applied to a mixed reality environment such as a game.
According to a preferred aspect of the present invention, the see-through display device comprises an optical transmission type display device.
According to a preferred aspect of the present invention, the second sensor means detects a head position and posture of each operator, and calculates the view point position in accordance with the detected head position and posture.
In order to detect the three-dimensional posture of the head of each operator, a magnetic sensor is preferably used. Therefore, according to a preferred aspect of the present invention, the second sensor means comprises a transmitter for generating an AC magnetic field, and a magnetic sensor attached to the head portion of each operator. With this arrangement, the three-dimensional posture of the head of each operator can be detected in a non-contact manner.
According to a preferred aspect of the present invention, the generation means comprises:
storage means for storing a rule of the collaborative operation;
means for generating a virtual image representing a progress result of the collaborative operation in accordance with the rule stored in the storage means in correspondence with detected changes in position of the plurality of actuators; and
means for generating a three-dimensional virtual image for each view point position by transferring a coordinate position for each view point position of each operator detected by the second sensor means.
Similarly, in order to achieve the above object, according to the present invention, a mixed reality presentation apparatus which generates a three-dimensional virtual image associated with a collaborative operation to be done by a plurality of operators in a predetermined mixed reality environment, and displays the generated virtual image on see-through display devices respectively attached to the plurality of operators, comprises:
a camera which includes a plurality of actuators operated by the plurality of operators in the collaborative operation within a field of view thereof;
actuator position detection means for outputting information associated with positions of the actuators on a coordinate system of that environment on the basis of an image sensed by the camera;
sensor means for detecting and outputting a view point position of each of the plurality of operators in the environment of the collaborative operation; and
image generation means for outputting a three-dimensional virtual image of a progress result viewed from the view point position of each operator detected by the sensor means to each see-through display device so as to present the progress result of the collaborative operation that has progressed according to detected changes in position of the actuator to each operator.
The above object is also achieved by a mixed reality presentation apparatus which generates a three-dimensional virtual image associated with a collaborative operation to be done by a plurality of operators in a predetermined mixed reality environment, and displays the generated virtual image on see-through display devices respectively attached to the plurality of operators. This apparatus comprises:
a camera which includes a plurality of actuators operated by the plurality of operators in the collaborative operation within a field of view thereof;
actuator position detection means for outputting information associated with positions of the actuators on a coordinate system of that environment on the basis of an image sensed by the camera;
sensor means for detecting and outputting a view point position of each of the plurality of operators in the environment of the collaborative operation; and
image generation means for outputting a three-dimensional virtual image of a progress result viewed from the view point position of each operator detected by the sensor means to each see-through display device so as to present the progress result of the collaborative operation that has progressed according to detected changes in position of the actuator to each operator.
The above object is also achieved by a mixed reality presentation apparatus which generates a three-dimensional virtual image associated with a collaborative operation to be done by a plurality of operators in a predetermined mixed reality environment, and displays the generated virtual image on see-through display devices respectively attached to the plurality of operators. This comprises:
a first camera which substantially includes the plurality of operators within a field of view thereof;
a first processor for calculating operation positions of the plurality of operators on the basis of an image obtained by the first camera;
a detection device for detecting a view point position of each operator using a plurality of sensors attached to the plurality of operators;
a plurality of second cameras for sensing front fields of the individual operators, at least one second camera being attached to each of the plurality of operators;
a second processor for calculating information associated with a line of sight of each operator on the basis of each of images from the plurality of second cameras;
a third processor for correcting the view point position of each operator detected by the sensor using the line of sight information from the second processor and outputting the corrected view point position as a position on a coordinate system of the mixed reality environment;
a first image processing device for making the collaborative operation virtually progress on the basis of the operation position of each operator calculated by the first processor, and generating three-dimensional virtual images representing results that have changed along with the progress of the collaborative operation for the plurality of operators; and
a second image processing device for transferring coordinate positions of the three-dimensional virtual images for the individual operators generated by the first image processing device in accordance with the individual corrected view point positions calculated by the third processor, and outputting the coordinate-transferred images to the see-through display devices.
The above object is also achieved by a method of generating a three-dimensional virtual image associated with a collaborative operation to be done within a predetermined mixed reality environment so as to display the image on see-through display devices attached to a plurality of operators in the mixed reality environment. This method comprises:
the image sensing step of sensing a plurality of actuators operated by the plurality of operators by a camera that includes the plurality of operators within a field of view thereof;
the actuator position acquisition step of calculating information associated with positions of the actuators on a coordinate system of the environment on the basis of the image sensed by the camera;
the view point position detection step of detecting a view point position of each of the plurality of operators in the environment of the collaborative operation on the coordinate system of the environment;
the progress step of making the collaborative operation virtually progress in accordance with changes in position of the plurality of actuators calculated in the actuator position acquisition step; and
the image generation step of outputting a three-dimensional virtual image of a progress result in the progress step viewed from the view point position of each operator detected in the view point position detection step to each see-through display device so as to present the progress result in the progress step to each operator.
The above object is also achieved by a mixed reality presentation method for generating a three-dimensional virtual image associated with a collaborative operation to be done by a plurality of operators in a predetermined mixed reality environment, and displaying the generated virtual image on see-through display devices respectively attached to the plurality of operators. This method comprises:
the first image sensing step of capturing an image using a first camera which substantially includes the plurality of operators within a field of view thereof;
the first detection step of detecting operation positions of the plurality of operators on the basis of the image sensed by the first camera;
the second detection step of detecting a view point position of each operator using a plurality of sensors respectively attached to the plurality of operators;
the second image sensing step of sensing a front field of each operator using each of a plurality of second cameras, at least one second camera being attached to each of the plurality of operators;
the line of sight calculation step of calculating information associated with a line of sight of each operator on the basis of each of images obtained from the plurality of second cameras;
the correction step of correcting the view point position of each operator detected by the sensor on the basis of the line of sight information calculated in the line of sight calculation step, and obtaining the corrected view point position as a position on a coordinate system of the mixed reality environment;
the generation step of making the collaborative operation virtually progress on the basis of the operation positions of the individual operators detected in the first detection step, and generating three-dimensional virtual images that represent results of the collaborative operation and are viewed from the view point positions of the plurality of operators; and
the step of transferring coordinate positions of the three-dimensional virtual images for the individual operators generated in the generation step in accordance with the individual corrected view point positions obtained in the correction step, and outputting the coordinate-transferred images to the see-through display devices.
It is another object of the present invention to provide a position posture detection apparatus and method, which can precisely capture an operator who moves across a broad range, and a mixed reality presentation apparatus based on the detected position and posture.
In order to achieve the above object, the present invention provides a position/posture detection apparatus for detecting an operation position of an operator so as to generate a three-dimensional virtual image that represents an operation done by the operator in a predetermined mixed reality environment, comprising:
a position/posture sensor for measuring a three-dimensional position and posture of the operator to output an operator""s position and posture signal;
a camera sensing images of a first plurality of markers arranged at known positions in the environment;
detection means for processing an image signal from said camera, tracking a marker of the first plurality of markers, and detecting a coordinate value of the tracked marker in a coordinate system; and
calculation means for calculating a portion-position and -posture representing a position and posture of the operating portion, on the basis of the coordinate value of the tracked marker detected by said detection means and the operator""s position and posture signal outputted from the position/posture sensor.
In order to achieve the above object, the present invention provides a position/posture detection method for detecting an operation position of an operator so as to generate a three-dimensional virtual image associated with an operation to be done by the operator in a predetermined mixed reality environment, comprising:
the step of measuring to output an operator position/posture signal indicative of a three-dimensional position and posture of the operator;
the step of processing an image signal from a camera which captures a plurality of markers arranged in the environment, tracking at least one marker and detecting a coordinate of said at least one marker; and
outputting a head position/posture signal indicative of a position and posture of the head of the operator, on the basis of the coordinate of the tracked marker and the measured operator position/posture signal.
In order to achieve the above object, the present invention provides a position/posture detection apparatus for detecting an operation position of an operator, comprising:
a position/posture sensor for measuring a three-dimensional position and posture of the operator to output an operator""s position and posture signal;
a camera sensing images of a first plurality of markers arranged at known positions in the environment;
detection means for processing an image signal from said camera, tracking a marker of the first plurality of markers, and detecting a coordinate value of the tracked marker in a coordinate system; and
correction means for correcting an output signal from the sensor on the basis of coordinate value of the tracked marker.
In order to achieve the above object, the present invention provides a mixed reality presentation apparatus comprising:
a work table having a first plurality of markers arranged at known positions;
a position/posture sensor attached to an operator to detect a head posture of the operator;
a camera being set to capture at least one of the first plurality of markers within a field of view of the camera;
a detection means for processing an image signal from the camera, tracking a marker from among the first plurality of markers, and detecting a coordinate value of a tracked marker;
calculation means for calculating a position/posture signal representing a position and posture of the operator""s view point, on the basis of the coordinate value of the tracked marker detected by said detection means and an operator""s head position/posture signal outputted from the position/posture sensor; and
generation means for generating a virtual image for presenting a mixed reality at the view point in accordance with the calculated position/posture signal.
The detection apparatus and method according to the invention as set forth can correct or detect a position and posture of the operator precisely even when the operator moves within a wide range environment, since at least one marker is assured to be captured in the image by the camera.
According to a preferred aspect of the invention, the markers are arranged so that a distance between one marker and another marker of the plurality of markers in a direction crossing in front of the operator is set to be larger as the markers are farther from the operator. This prevents from deterioration of precision in identifying a marker.
According to a preferred aspect of the invention, the markers are arranged so that a layout distribution density of the plurality of markers in the environment is set so that a density distribution of markers farther from the operator is set to be lower than a density distribution of markers closer to the operator. This also prevents from deterioration of precision in identifying a marker.
According to a preferred aspect of the invention, where a plurality of operators perform a collaborative operation, markers for one operator are of the same representation manner. The markers for one operator have the same color, for example. This facilitates to discriminate markers from those for each other operator.
According to a preferred aspect of the invention, the portion is a view point position of the operator.
According to a preferred aspect of the invention, said detection means uses a marker firstly found within an image obtained by said camera. It is not necessary to keep to tack one marker in the invention. It is enough for any one marker to be found. Using a first found marker facilitates to search or track a marker.
According to a preferred aspect of the invention, the detection means searches an image of a present scene for a marker found in an image of a previous scene. This assures continuity in the tracking.
The sensor may be mounted anywhere of the operator. According to a preferred aspect of the invention, the sensor is mounted on the head of the operator. The sensor is close to the view point of the operator. This facilitates application to HMD.
According to a preferred aspect of the invention, the first plurality of markers are arranged within the environment so that at least one marker is captured within the field of image of the camera.
Detection of tracked marker can be made in various coordinate systems. According to a preferred aspect of the invention, said detection means calculates a coordinate of the tracked marker in an image coordinate system. According to a preferred aspect of the invention, said detection means calculates a coordinate of the tracked marker in camera coordinate system.
According to a preferred aspect of the invention, the first plurality of markers are depicted on a planar table arranged within the environment. This is suitable for a case where the collaborative operation is made on the table.
According to a preferred aspect of the invention, said first plurality of markers are arranged in a three-dimensional manner. This aspect is suitable for a case where markers must be arranged in a three-dimensional manner.
According to a preferred aspect of the invention, the detection means comprises identifying means for identifying a marker to be tracked from among said first plurality of markers.
Similarly, according to a preferred aspect of the invention, the detection means comprises means for selecting, where said detection means detects a second plurality of markers within an image capture by said camera, one marker to be tracked from among said second plurality of markers.
According to a preferred aspect of the invention, the identifying means identifies a marker selected by the selection means in terms of an image coordinate system.
According to a further aspect of the invention, the identifying means comprises:
means for detecting a signal representing a position/posture of the camera;
means for converting three-dimensional coordinates of said first plurality of markers in the world coordinate system into a coordinate value in terms of the image coordinate system, in accordance with the signal representing position/posture of the camera; and
means for identifying a marker to be tracked by comparing the coordinates of the first plurality of markers in the image coordinate system and an image coordinate value of the tracked marker.
According to another aspect of the invention, the identifying means identifies a marker selected by the selection means in terms of a world coordinate system. And, according to yet further aspect of the invention, the identifying means comprises:
means for detecting a signal representing a position/posture of the camera;
means for converting a coordinate of the tracked marker in terms of a camera coordinate system into a coordinate value in terms of the world coordinate system; and
selection means for selecting said at least one marker to be tracked by comparing coordinates of the second plurality of markers and coordinates of the first plurality of markers, in terms of the world coordinate system.
Where an image coordinate system is used, according to a yet further aspect of the invention, the operation portion includes a view position of the operator,
said calculation means obtains a position/posture signal at a view point of the operator on the basis of:
said operator position/posture signal, and
a distance difference between an image coordinate value of the tracked marker and a coordinate value of the tracked marker which is converted from a three dimensional coordinate of the marker in the world coordinate system.
Where a world coordinate system is used, according to a yet further aspect of the invention, the operation portion includes a vie position of the operator,
said calculation means obtains a position/posture signal at a view point of the operator on the basis of:
said operator position/posture signal, and
a distance difference between a coordinate value of the tracked marker which is converted from the camera coordinate system into the world coordinate system and a three dimensional coordinate of the marker in the world coordinate system and a coordinate value of the tracked marker.
The camera may comprises plural camera units. This allows to detect a coordinate of a tracked marker in a camera coordinate system. Thus, Error in the position/posture sensor is corrected in three-dimensional manner. Further, The tracked marker is identified in the world coordinate system, the multiple cameras can cope with the markers arranged three-dimensionally. Furthermore, Preciseness in identifying a racked marker is improved compared with that in the image coordinate system.
Other features and advantages of the present invention will be apparent from the following description taken in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the figures thereof.