Recently, various apparatuses and methods have been proposed for use as interfaces between computers, or other electronic equipment, and human operators, i.e., man-machine interfaces, and, especially for game machines and operation guidance apparatuses, techniques have been proposed according to which, to ensure the performance of an operation, a full image of the figure of an operator, or a partial image of the figure is recorded by a camera, and the intent of the operator is determined based on either recorded image type. Further, an example technique proposed in Japanese Patent Laid-Open No. 2004-78977, includes the use of a host computer for identifying the shape and action of a subject that appears in an image recorded by a CCD camera, and a display device for displaying the shape provided and the action performed by the subject identified by the host computer, so that when an operator facing the CCD camera provides an instruction by moving a hand or hands, the hand movement is displayed on the screen of the display device and the operator can, by moving the hand, move the icon of an arrow cursor to point at and select a virtual switch on the display screen. Thus, an input device, such as a mouse, is not required, and for the apparatus, a very simple operation is obtained.
Another input system has been proposed whereby, for entering an operation, a specific type of bodily gesture is identified by employing an image representing the action or the shape of hands or fingers. Referring to FIG. 14, an example input apparatus is illustrated that can be employed for a presentation that is given while a screen is being operated in accordance with an instruction conveyed via a gesture, or that can be employed for a non-contact kiosk terminal that does not require a touch panel. When an operator facing a large screen performs various operations directed toward a camera installed at the normal position (A), the operation contents are displayed on the large screen. Based on the image thus obtained, the shape, or the action performed by the operator is extracted by employing a method well known in the technical field of the present invention, and is compared with patterns that are, for example, predesignated and stored in a database, and the definition of the shape, or the actions performed by the operator are determined and employed to control the apparatus.
Meanwhile, a technique for capturing the image of an operator is employed, for example, for a security check, because, as shown in FIG. 15, a three-dimensional or a stereoscopic camera is employed for recording the image of an operator, and thus, a three-dimensional image can be reproduced. When a three-dimensional image is reproduced, the stereoscopic actions of the operator can be obtained, and especially the front and rear movement of the hands of the operator can be identified, as shown, for example, in FIG. 16. Thus, the types of gestures become more diverse. Furthermore, when an extracted image includes a plurality of operators, the positional relationship of multiple people can be identified based on the three-dimensional image, and simply the action of the operator in front need be extracted and employed for entering an instruction for an operation.
However, for a conventional operation during which gestures are used, specific standard gestures, such as de facto standards, have not been established, and a user can not identify, at a single glance, the correlation of an action with an available operation, other than one during which the index fingers are used for pointing at XY coordinates. Actually, there are operations for which an instruction is entered by holding a click, such as for “click”, “double click” or “drag”, at coordinates for a waiting time period of several seconds; however, since, for example, the designated waiting time is too long, it is not unusual that the smooth operation is interrupted. Therefore, there is a problem that a realistic method is not present whereby an operation, such as clicking or deciding (double click, etc.), is easily and smoothly performed.
Moreover, unlike an input apparatus like a touch panel that an operator can touch directly, it is difficult for a conventional gesture detection apparatus to exactly read the intent of an operator. Specifically, as a problem, when an operator has moved in a certain way, it is not easy to determine whether the action of the operator indicates input intent, or whether the operator moved simply because of a habit. As a result, even a simple gesture, for example, can not be identified unless it is performed unnaturally and noticeably, and as another problem, either an advance rule for gestures is required, or the use of complicated gestures is inhibited.
While taking these problems into account, one objective of the present invention is to provide an image recognition apparatus and an operation determination method whereby an operator is first allowed to identify a condition under which the operator is performing an operation correlated with a specific entry, and to then use a gesture to enter an operation instruction for the apparatus. As a result, an untrained operator is not required to learn special gestures, and simply need move the entire, or only a part, of the body, and the action can be identified as an operation exactly representing the intent of the operator.