1. Field of the Invention
The present invention relates to a 3D human interface apparatus for facilitating 3D pointing and controlling command inputs by an operator, which utilizes a motion recognition technique for determining the motion of an operator operated object according to changes of the feature points of the object on a plurality of images of the object.
2. Description of the Background Art
There are several conventional propositions for a scheme for obtaining a structure of an object imaged on a plurality of images.
For example, there is a scheme described by S. Ullman in "The interpretation of visual motion", MIT Press, Cambridge, U.S.A., 1991. for determining the structure and the motion of four non-coplanar points on a rigid object from at least three parallel projection images in which the correspondences of these four points are known.
Also, there is a scheme described by H.C. Longuest-Higgins in "A computer algorithm for reconstructing a scene from two projections", Nature, 293, pp. 133-135, 1981, which is a linear calculation scheme for obtaining the structure and the motion of eight corresponding points on two perspective transformed images.
Also, O. D. Faugeras and J. Maybank describes in "Motion from point matches: multiplicity of solutions", IEEE Workshop on MOtion, pp. 248-255, 1989, that there are only a finite number of the structures and the motions that can satisfy the correspondences among five points on two perspective projection images.
Also, Japanese Patent Application Laid Open (Kokai) No. 3-6789 (1991) discloses a scheme for determining a 3D rotational motion from corresponding points on two images first, and then determining a 3D positional relationship with one of the corresponding points as a reference point from the determined rotational motion data.
All of these propositions belong to a type of scheme which sets up equations between the 3D coordinates of the object and the coordinates on the perspective projection images of this object, and solves these equations to obtain the answer.
In this type of scheme, the structure and the motion of the object can be calculated efficiently when an imaging target object is positioned very close to an imaging device such that it appears large in the images.
However, when the area occupied by the imaging target object in the image is small or when the distance between the imaging device and the imaging target object is large, as in the actual processing images, the deformation of the image due to the perspective projection on a basis of which the motion of the object is to be calculated becomes small, so that the calculation result becomes unstable. Namely, in such a case, it becomes difficult to distinguish the parallel displacement in a direction perpendicular to the viewing direction and the rotation around an axis perpendicular to a direction of that parallel displacement, for example.
In addition, when the effect of the perspective projection is small, there arises an ambiguity in the depth, such that it becomes difficult to distinguish the nearby object with not so well defined features from the distant object with well defined features, or to distinguish the motion of the nearby small object from the motion of the distant large object.
On the other hand, there is also a type of scheme described by J. J. Koenderink and A. J. van Doorn in "Affine structure from motion", Journal of Optical Society of America, Vol. 8, No. 2, pp. 377-385, 1991, in which the motion of the object is expressed by the affine transformation (linear transformation), according to which the structure of the object is determined. In this scheme, a rough structure of the object can be calculated from two frames of the dynamic images of the object.
However, in the structure of the object calculated by this scheme, the data along the depth direction involve an unknown coefficient proportional to a distance from the camera to the object, so that it is difficult to calculate the motion of the object according to the structure calculated by this scheme.