The present disclosure relates to an information processing device, an information processing method, and a program, and particularly to an information processing device, an information processing method, and a program that enables an agent, for example, a robot, or the like that can carry out an action to easily perform learning of an object in the environment where the action is taken.
In the related art, it is necessary to cut out an image area of a learning target (or a recognition target) in performing learning (or recognition) of an object in a certain environment using an image obtained by capturing the environment with a camera.
As a method of cutting out an image area of a learning target, there are an approach (Japanese Unexamined Patent Application Publication No. 7-88791) mainly using prior knowledge on the external appearance of the learning target, and an approach (Japanese Unexamined Patent Application Publication Nos. 5-282275, 7-29081, and 2005-128959) using motions of the target object.
In the approach using prior knowledge on the external appearance, marking for specifying an object or creation of a recognition model by performing learning of a learning target (target object) in advance is performed.
In the approach using motions of a target object, only an image area in which motions are made using an image difference, an optical flow, or the like is extracted.
However, in the extraction of the image area where motions are made, the background (of the image) has to stand still. Thus, when a camera for capturing images is mounted on a robot that can perform various actions, for example, if the background of an image captured by the camera falls in disorder due to the moving sight of the robot, it is difficult to appropriately cut out an area.
In addition, in an object operation task for operating an object by a robot, if the object as the operation target is to be discriminated from the hands (of the robot itself) for operating the object by the robot, it is necessary to mark labels for discriminating the respective object from hands and for the robot to identify the labels in the approach using prior knowledge on the external appearance, and it is necessary to recognize whether or not an image area cut out from an image captured by a camera is an image area of the object in the approach using motions of target object.
Furthermore, in the recognition whether or not the image area cut out from the image captured by the camera is an image area of the target object, it is necessary to designate the hands (to give knowledge about the hands) so as to cause a recognition device, which performs the recognition, to discriminate the hands from the object.
In addition, in the technique disclosed in Japanese Unexamined Patent Application Publication No. 2005-128959, a geometric model is created in advance, which shows, in an image captured by a camera, how the robot arms including the hand are taken, where the position of the fingertips (the hands) of the robot arms moves depending on what kind of command is output to the robot arms, or the like, and then an object operation is performed according to the geometric model.
In the technique disclosed in Japanese Unexamined Patent Application Publication No. 2005-128959, since the object operation is performed according to the geometric model as described above, it is necessary to manually modify the geometric model in every case where the relative positions of the camera and the robot arms change, a lens of the camera is replaced, the size of the robot arms is changed, or the like.