The present invention generally relates to a robot apparatus, method for controlling the action of the robot apparatus, and an external-force detecting apparatus and method.
Conventionally, the knowledge acquisition or language acquisition are based mainly on the associative memory of visual information and audio information.
The xe2x80x9cLearning Words from-Natural Audio-Visual Inputxe2x80x9d (by Deb Roy and Alex Pentland) (will be referred to as xe2x80x9cDocument 1xe2x80x9d hereinunder) discloses the study of language learning from input speech and input image. The learning method in the Document 1 is as will be outlined below.
Image signal and speech signal (acoustic signal) are supplied to a learning system simultaneously with each other or at different times. In the Document 1, the event of image and speech in such a pair supplied simultaneously with each other or at different times is called xe2x80x9cAV eventxe2x80x9d.
When the image signal and speech signal are thus supplied, an image processing is made to detect a color and shape from the image signal by an image processing, while a speech processing is made to detect a recurrent neural network from the speech signal and make a phonemic analysis of the speech signal. More particularly, the input image is classified to each class (class for recognition of a specific image or image recognition class) based on a feature in the image feature space, while the input speech is classified to each class (class for recognition of a specific sound or sound recognition class) based on a feature in the sound feature space. The feature space is composed of a plurality of elements as shown in FIG. 1. For example, for the image signal, the feature space is composed of a two-dimensional or multi-dimensional space of which the elements are color-difference signal and brightness signal. Since the input image has a predetermined mapping of elements thereof in such a feature space, color can be recognized based on the element mapping. In the feature space, the classification is made in view of a distance to recognize a color.
For recognition of a sound for example, the continuous recognition HMM (hidden Markov model) method is employed. The continuous recognition HMM method (will be referred to simply as xe2x80x9cHMMxe2x80x9d hereunder) permits a speech signal to be recognized as a phoneme sequences. Also, the above recurrent neural network is a one through which a signal feed back to the input layer side.
Based on a correlation concerning a concurrence (correlative learning), a classified phoneme is correlated with a stimulus (image) classified by the image processing for the purpose of learning. That is, a name and description of a thing indicated as an image are acquired as a result of the learning.
As shown in FIG. 2, in the above learning, an input image is identified (recognized) according to image classes including xe2x80x9cred thingxe2x80x9d, xe2x80x9cblue thingxe2x80x9d, . . . each formed from image information, while an input speech is identified (recognized) according to classes including uttered xe2x80x9credxe2x80x9d, xe2x80x9cbluexe2x80x9d, xe2x80x9cyellowxe2x80x9d, . . . formed from sound information.
Then the image and speech classified as in the above are correlated with each other by the correlative learning, whereby when xe2x80x9ca red thingxe2x80x9d is supplied as an input image, a learning system 200 in FIG. 2 can output an phoneme sequences of xe2x80x9credxe2x80x9d (uttered) as a result of the correlative learning.
Recently, there has been proposed a robot apparatus which can autonomously behave in response to a surrounding environment (external factor) and internal state (internal factor such as state of an emotion or instinct). Such a robot apparatus (will be referred to as xe2x80x9crobotxe2x80x9d hereunder) is designed to interact with the human being or environment. For example, there have been proposed so-called pet robots and the like each having a shape like an animal and behaving like the animal.
For example, capability of having such a robot learn various kinds of information will lead to an improvement of its amusement. Especially the capability of learning action or behavior will enhance the fun to play with the robot.
The application of the aforementioned learning method (as in the Document 1) to a robot designed to be controllable to act encounters the following problems.
First, the above learning method is not appropriately set to control the robot to act.
As disclosed in the Document 1, utterance will create and output an appropriate phoneme sequences if a stored word is created in response to an input signal or the input signal is judged to be a new signal. However, the robot is not required to utter an input signal as it is for the interaction with the human being or environment but it is required to act appropriately in response to an input.
Also, when classified based on a distance in the image feature space and sound feature space, acquired image and speech will be information near to each other in the image and sound feature spaces. However, the robot is required to act differently in response to the image and speech in some cases. In such a case, the classification has to be done for appropriate action. However, the conventional methods cannot accommodate such requirements.
The conventional knowledge or language acquisition system includes mainly the following:
(1) Means for classifying image signal and generating new classes
(2) Means for classifying acoustic signal and generation new classes
(3) Means for correlating results from items (1) and (2) with each other or learning image and sound in association with each other
Of course, some of the conventional knowledge or language acquisition systems use other than the above functions. But the above three functions are essential ones for such systems.
The classifications as in the above items (1) and (2) including mapping in a feature space, parametric discrimination of significant signal with a foreseeing knowledge, use of a probabilistic classification, etc.
Generally, an image can be recognized for example by controlling a threshold of a color template for each of colors such as red, blue, green and yellow in the color space or by determining, for a presented color stimulus, a probability of each color based on a distance between an existing color storage area and input color in the feature space. For example, for an area already classified as a feature in a feature space as shown in FIG. 1, a probability of the classification is determined from a distance of an area defined by a feature of an input image from the existing feature area. Also, a method by a neural net is effectively usable for this purpose.
On the other hand, for learning a speech, a phoneme sequences supplied by the HMM through a phoneme detection and a stored phoneme sequences are compared with each other and a word is probabilistically recognized based on a result of the comparison.
The means for generating new classes as in the above items (1) and (2) include the following:
An input signal is evaluated to determine whether it belongs to an existing class. When the input signal is determined to belong to the existing class, it is made to belong to that class and fed back to the classification method. On the other hand, if the input signal is judged not to belong to any class, a new class is generated and a learning is made for the classification to be done based on an input stimulus.
A new class is generated as follows. For example, if an image class is judged not to belong to any existing classes (class of image A, class of image B, . . . ), the existing class (e.g., class of image A) is divided to generate a new image class as shown in FIG. 3A. If a sound class is judged not to belong to any existing classes (class of sound xcex1, class of sound xcex2, . . . ), the existing class (e.g., class of sound xcex2) is divided to generate a new sound class as shown in FIG. 3B.
Also, the association of an image and sound as in the item (3) includes an associative memory or the like.
A discrimination class for an image is called a vector (will be referred to as xe2x80x9cimage discrimination vectorxe2x80x9d hereunder) IC [i](i=0, 1, . . . , NICxe2x88x921) and a discrimination class for a sound is called a vector (will be referred to as xe2x80x9csound discrimination vectorxe2x80x9d hereunder) SC[j](j=0, 1, . . . , NSC=1). For an image signal and sound signal presented (supplied for learning), a probability or result of evaluation of each recognition class are set to vector values, respectively.
In a self-recalling associative memory, an image recognition vector IC and sound recognition vector SC are made a single vector given by the following equations (1) and (2):
CV[n]=IC[n](0 less than n less than NIC)xe2x80x83xe2x80x83(1) 
xe2x80x83CV[n]=SC[nxe2x88x92NIC](0 less than n less than NSC)xe2x80x83xe2x80x83(2)
Note that in the field of the self-recalling associative memory, the so-called Hopfield net proposed by Hopfield is well known.
The above vectors are made a single vector as will be described below. On the assumption that the vector CV is a column vector, the self-recalling associative memory is made by adding a matrix deltaxe2x80x94W as given by the following equation (3) to a currently stored matrix W:
deltaxe2x80x94W =CVxc3x97trans(CV)xe2x80x83xe2x80x83(3) 
Thus, an image stimulus (input image) can be regarded as a class and a word as a result of speech recognition (e.g., class of HMM) can be associated with the class. By presenting a new image (e.g., red thing) and entering an speech xe2x80x9credxe2x80x9d each of the image and sound classes is depicted in red of the image stimulus to have an appropriate size for a stimulus or distance in the feature space, and similarly, each class reacts to an appropriate extent for the phoneme sequences of the speech xe2x80x9credxe2x80x9d. These classes are handled as a correlative matrix in the above equations and stochastically averaged so that the image and speech classes have high values with respect to the same stimulus, namely, they have a high correlation between them. Thus, when a red image is presented, an HMM class xe2x80x9cred (uttered)xe2x80x9d is stored in association with the red image.
On the other hand, the xe2x80x9cPerceptually Grounded Meaning Creationxe2x80x9d (by Luc Steels, ICMAS, Kyoto, 1996) (will be referred to as xe2x80x9cDocument 2xe2x80x9d hereunder) discloses a meaning acquisition by an experiment called xe2x80x9cdiscrimination gamexe2x80x9d. The discrimination game is as will be outlined below.
The xe2x80x9cdiscrimination gamexe2x80x9d system includes a plurality of sensor channels and feature detectors not limited for image and sound as in the above. A thing called xe2x80x9cagentxe2x80x9d (e.g., a software) tries, by means of the feature detectors, to differentiate between a newly presented object and another object (already recognized one), namely, it makes a differentiation between the objects based on a feature. If there exists no feature with which a differentiation between objects can be done, a new feature detector is created which corresponds to the newly presented object. If an object has not a feature with which a differentiation can be done from another object, namely, when a corresponding feature detector is not available for an object, the agent is judged to have won the discrimination game. If an object has a corresponding feature detector, the agent is judged to be a winner of the game.
Then, the entire system works based on the principle of xe2x80x9cselectionistxe2x80x9d. That is, an object having won the game has a higher probability of survival, while an object having lost the game will create a new feature detector. However, the new feature detector will be used in a next game and it is not known whether the detector will provide a correct result. Thus, an agent capable of a better differentiation will survive.
The discrimination game has been outlined in the above. In other words, such a discrimination game may be regarded as a method for creating a better feature detector through a natural selection.
Also, xe2x80x9cThe Spontaneous Self-Organization of An Adaptive Languagexe2x80x9d (by Luc Steels, Muggleton, S. (ed.), 1996, Machine, Intelligence 15.) (will be referred to as xe2x80x9cDocument 3xe2x80x9d hereunder) reads an language generation by a xe2x80x9clanguage gamexe2x80x9d method. The xe2x80x9clanguage gamexe2x80x9d includes the following three steps:
More specifically, the language game includes a first step for a so-called image processing, a second step for a word related with language processing (actually, however, no speech is recognized but so-called character is entered), and a third step in which an image acquired in the first step (step 1) is associated with the word. The aforementioned discrimination game has no part equivalent to the second step but it is applied only to a differentiation effected in an existing feature space.
Also, the xe2x80x9cLanguage Acquisition with A Conceptual Structure-Based Speech Input from Perceptual Informationxe2x80x9d (by Iwasaki and Tamura, Sony Computer Science Laboratory) (will be referred to as xe2x80x9cDocument 4xe2x80x9d hereunder) discloses an acquisition of a grammar using HMM for the speech recognition and a typical pattern (in circular, triangular or other shapes, and red, blue and other colors) in which an image is displayed in colors on a computer monitor for the image recognition.
In the Document 4, the user simultaneously clicks a pointing device or mouse (with a pointer 212 pointed) on a pattern (an object) on a monitor 210 as shown in FIG. 4, and utters xe2x80x9cred circlexe2x80x9d or the like. The discrimination game theory for color images and speech recognition for HMM are used to effect the first to third steps probabilistically in the language game in the Document 3.
For generation of a new class, a predetermined method for verification is effected. In the method disclosed in the Document 4, when it is judged that a new class should be generated by the verification using HMM for the speech recognition, the HMM is subdivided to generate the new class.
Further, a pattern 211 (first object (Obj 1) selected by pointing the cursor thereto and clicking the mouse is moved onto a second object (Obj2) 213 as indicated with an arrow in FIG. 4, and at the same time, an uttered speech xe2x80x9cmountxe2x80x9d is supplied to recognize a movement of a pattern, made on the monitor 210. The movement thus recognized is classified by HMM.
As in the foregoing, a variety of techniques for knowledge or language acquisition has been proposed. However, these techniques are not advantageous as in the following concerning the aspect of action acquisition (action learning) in the robot.
(1) Evaluation of distance in feature space and belongingness to a class, of input signal
(2) Creation and evaluation of action
(3) Sharing of target object between robot and user. So-called target object sharing
The above problem (1) is difficult to solve since evaluation of the belongingness of an input image signal to a class is influenced only by information related to the image signal, sound signal supplied at the same time or by stored information recalled based on the two signal. Note that a belongingness-to-class evaluation is an index for a class to which of classes an input signal belongs.
Assume here that there has been entered an image signal considered very near to image signals in an existing class in the feature space. In this case, classes A and B are near to each other in the image feature space as shown in FIG. 5A. However, it is assumed that the image signal thus entered is intended to generate a new class.
On the other hand, if there is made under these conditions a judgment that a speech signal has been entered as other information on an object corresponding to the image signal and the input speech signal is very far from the existing classes, a new class for the speech signal will be generated for the object. So, it is assumed for example that as shown in FIG. 5B, a class of sound xcex1 (sound class corresponding to the class of image A) and a class of sound xcex2 (sound class corresponding to the class of image B) are mapped differently in the sound feature space and so a threshold S2 can be set.
Therefore, if the belongingness-to-class evaluation of a sound, made based on the feature space, can reflect the belongingness-to-class evaluation of an image, it is possible to generate a new class for the image. For example, by reflecting the belongingness-to-class evaluation in the sound feature space, there can be set a threshold S1 between the classes of images A and B near to each other to differentiate between the classes as shown in FIG. 5A. That is, by making reference to any other belongingness-to-class evaluation, belongingness to a class can be effected more appropriately.
However, if classes of image signal or speech signal are very near to each other, the above is not sufficient to generate a new class for the image or speech. It means that when image classes or sound classes are near to each other in their respective feature space as shown in FIGS. 6A and 6B, they cannot be differentiated between them even if they have quite different features from each other as viewed from a third feature space. The third feature space may be indicative of a feature of action.
Accordingly, the present invention has an object to overcome the above-mentioned drawbacks of the prior art by providing a robot apparatus and a method for controlling the action of the robot apparatus, adapted to appropriately differentiate objects in their respective feature spaces.
The above problem (2) is to generate, when a signal to be judged to belong to a new class is supplied to the robot apparatus, new action of the robot apparatus and to evaluate the new action.
With the conventional technique, evaluation of a language creation corresponds to evaluation of generated action. With the technique disclosed in the Document 3, an arbitrary phoneme sequences is generated. It will be a name or the like of an object contained in an input signal, maybe, an image signal. However, any arbitrary motion series should not be generated to generate action.
For example, even if there is generated an arbitrary series of each joint angle of the robot apparatus having four legs having a 3 degree of freedom for example, the robot apparatus will not make any meaningful motion. When a language is generated, the phoneme sequences of the language will only be a name of the object. However, it will be a problem how to evaluate generated action, good or not good.
Also the present invention has another object to overcome the above-mentioned drawbacks of the prior art by providing a robot apparatus and a method for controlling the action of the robot apparatus, adapted to generate appropriate action for an input.
The above-mentioned problem (3) is the so-called target object sharing (shared attention). This problem is caused by the fact that information perceived by the robot apparatus is very variable. For example, even when the user or trainer tries to teach the robot apparatus by holding an orange ball in a direction not towards the image signal input unit (e.g., CCD camera) of the robot apparatus and uttering xe2x80x9corange ballxe2x80x9d, if an object within the field of view of the robot apparatus is a pink box, the xe2x80x9cpinkxe2x80x9d box will be associated with the speech xe2x80x9corange ballxe2x80x9d.
In the Document 4, the pattern 211 on the monitor 210 is designated as a target object by pointing the cursor to the pattern 211 and clicking the mouse. Actually, however, there is not available any means for pointing or designating such a target object. Even in case the theories disclosed in the Documents 2 and 3 are applied to the robot apparatus, the trainer or user of the robot apparatus will select at random one of some things in his or her field of view and utter the name of the thus-selected thing based on his memory to direct the robot apparatus""s attention towards the selected thing as a target object to be recognized by the robot apparatus. Actually, however, this is not any learning by which the robot apparatus can recognize the target object.
Also the present invention has another object to overcome the above-mentioned drawbacks of the prior art by providing a robot apparatus and a method for controlling the action of the robot apparatus, adapted to share a target object (attention sharing) in order to appropriately recognize the target object.
The conventional robot apparatus detects an external force applied to the head or the other thereof via a touch sensor or the like provided at the head, thereby interacting with the user. However, the interaction will be limited by the number of sensors provided and location of the latter.
Accordingly, the present invention has an object to overcome the above-mentioned drawbacks of the prior art by providing a robot apparatus, external force detector and a method for detecting an external force, capable of assuring a higher degree of freedom in interaction with a touch (external force) by the user.
The present invention has another object to provide a robot apparatus and a method for controlling the action of the robot apparatus capable of appropriately recognizing each object in its feature space.
The present invention has another object to provide a robot apparatus and a method for controlling the action of the robot apparatus, capable of generating appropriate action in response to an input.
The present invention has another object to provide a robot apparatus and a method for controlling the action of the robot apparatus, capable of sharing a target object (attention sharing) to appropriately recognize the target object.
The present invention has another object to provide a robot apparatus, external force detector and a method for detecting external force, capable of assuring a higher degree of freedom in interaction with a touch (external force) by the user.
The above object can be attained by providing a robot apparatus including:
means for detecting a touch;
means for detecting information supplied simultaneously with, just before or after the touch detection by the touch detecting means;
means for storing action made correspondingly to the touch detection in association with the input information detected by the input information detecting means; and
means for recalling action from information in the storing means based on a newly acquired information to control the robot apparatus to do the action.
In the above robot apparatus, information supplied just before or after the touch detection by the touch detecting means is detected by the input information detecting means, action made in response to the touch and input information detected by the input information detecting means are stored in association with each other into the storing means, and action is recalled by the action controlling means from information in the storing means based on a newly acquired input information to control the robot apparatus to do the action.
Thus, in the above robot apparatus, input information and action made when the input information has been detected are stored in association with each other, and when information identical to the input information is supplied again, corresponding action is reproduced.
Also the above object can be attained by providing a method for controlling the action of a robot apparatus, including the steps of:
detecting a touch made to the robot apparatus;
detecting information supplied simultaneously with or just before or after the touch detection in the touch detecting step;
storing action made in response to the touch detection in the touch detecting step and input information detected in the input information detecting step in association with each other into a storing means; and
recalling action from the information in the storing means based on newly acquired input information to control the robot to do the action.
In the above robot apparatus action controlling method, input information and action made when the input information has been detected are stored in association with each other, and when information identical to the input information is supplied again, corresponding action is reproduced.
Also the above object can be attained by providing a robot apparatus including:
means for detecting input information;
means for storing the input information detected by the input information detecting means and action result information indicative of a result of action made correspondingly to the input information detected by the input information detecting means; and
means for identifying action result information in the storing means based on a newly supplied input information to control the robot apparatus to do action based on the action result information.
In the above robot apparatus, action result information indicative of a result of action made correspondingly to the input information detected by the input information detecting means and the input information are stored in association with each other into the storing means, and action result information in the storing means is identified based on a newly supplied input information to control the robot apparatus to do action based on the action result information.
Thus in the above robot apparatus, input information and action result information indicative of action made correspondingly to the input information are stored in association with each other, and when identical information is supplied again, past action is recalled based on the action result information corresponding to the input information to control the robot apparatus to do appropriate action.
Also the above object can be attained by providing a method for controlling the action of a robot apparatus, including the steps of:
storing action result information indicative of a result of action made correspondingly to input information detected by an input information detecting means and the input information itself in association with each other into a storing means; and
identifying action result information in the storing means based on newly supplied input information to control the robot apparatus to make action based on the action result information.
By the above robot apparatus action controlling method, the robot apparatus stores input information and action result information indicative of a result of action made based on the input information in association with each other, and when identical input information is supplied again, past action is recalled based on action result information corresponding to the input information to control the robot apparatus to do appropriate action.
Also the above object can be attained by providing a robot apparatus including:
means for detecting input information;
means for detecting a feature of the input information detected by the input information detecting means;
means for classifying the input information based on the detected feature;
means for controlling the robot apparatus to do action based on the input information; and
means for changing the classification of the input information having caused the robot apparatus to do the action based on action result information indicative of a result of the action made by the robot apparatus under the control of the action controlling means.
In the above robot apparatus, a feature of input information detected by the input information detecting means is detected by the feature detecting means, the input information is classified based on the detected feature, the robot apparatus is controlled by the action controlling means to act based on the classification of the input information, and the classification of the input information, having caused the robot apparatus action, is changed based on action result information indicative of a result of the action made by the robot apparatus under the control of the action controlling means.
Thus the above robot apparatus acts correspondingly to the classification of input information and changes the classification based on a result of the action.
Also the above object can be attained by providing a method for controlling the action of a robot apparatus, including the steps of:
detecting a feature of input information detected by an input information detecting means;
classifying the input information based on the feature detected in the feature detecting step;
controlling the robot apparatus to act based on the classification of the input information, made in the information classifying step; and
changing the classification of the input information having caused the robot apparatus to do the action based on action result information indicative of a result of the action made by the robot apparatus controlled in the action controlling step.
By the above robot apparatus action controlling method, the robot apparatus is controlled to act correspondingly to the classification of input information and changes the classification based on a result of the action.
Also, the above object can be attained by providing a robot apparatus including:
means for identifying a target object;
means for storing information on the target object identified by the target object identifying means; and
means for controlling the robot apparatus to act based on information on a newly detected object and information on the target object, stored in the storing means.
The above robot apparatus stores information on a target object identified by the target object identifying means into the storing means, and is controlled by the action controlling means to act based on the information on the newly detected object and information on the target object, stored in the storing means.
Thus the above robot apparatus stores a target object, and when information on an identical object is supplied again, the robot apparatus makes predetermined action.
Also, the above object can be attained by providing a method for controlling the action of a robot apparatus, including the steps of:
identifying a target object;
storing information on the target object identified in the target object identifying step into a storing means; and
controlling the robot apparatus to act based on information on a newly detected object and information on the target object, stored in the storing means.
By the above robot apparatus action controlling means, the robot apparatus stores a target object, and when an identical object is supplied again, the robot apparatus makes predetermined action.
Also the above object can be attained by providing a robot apparatus including:
moving members,
joints to move the moving members,
detecting means for detecting the state of the joint to which an external force is applied via the moving member; and
means for learning the joint state detected by the detecting means and external force in association with each other.
In the above robot apparatus, the state of the joint to which an external force is applied via the moving member can be detected by the detecting means and the joint state detected by the detecting means and external force are learned in association with each other by the learning means. That is, the robot apparatus learns an external force in association with a joint state which varies correspondingly to the external force acting on the moving member.
Also, the above object can be attained by providing an external force detector including:
means for detecting the state of a joint which moves a moving member; and
means for detecting an external force acting on the moving member based on the joint state detected by the joint state detecting means.
In the above external force detector, the state of the joint which moves the moving member is detected by the joint state detecting means and the external force acting on the moving member is detected based on the joint state detected by the joint state detecting means. Namely, the external force detector detects an external force acting on the moving member based on the state of a joint which moves the moving member.
Also, the above object can be attained by providing a method for detecting an external force, including the steps of:
detecting the state of a joint which moves a moving member;
detecting an external force acting on the moving member based on the detected joint state; and
detecting the external force acting on the moving member based on the state of the joint which moves the moving member.