As known, a conventional movable robot uses ultrasonic sensors or infrared sensors for sensing a distance to obstacles, as well as a camera for having a conversation with a person.
Several sensors of the same type, each having a highly limited sensing range, must be mounted around the robot spaced uniformly in order to sense the whole vicinity of the robot, which makes an internal configuration and functions of the robot complex.
A stereo camera system for extracting distance information from disparity maps imaged by two or more cameras mounted spaced from one another for sensing a distance from a movable robot to an obstacle without using several sensors may be mounted and used to the movable robot.
Since a horizontal viewing angle of one camera is two to three times wider than a horizontal detection angle of an ultrasonic sensor and a stereo camera is normally mounted on a pan tilt mechanism of the robot being able to be directed in any direction every moment, a movable robot using the stereo camera does not need ultrasonic sensors or infrared sensors, thus simplifying an internal configuration of the robot and facilitating maintenance of the robot.
However, most of commercially available stereo camera systems are adapted to output only a disparity map in consideration of a connection with a personal computer (“PC”), which requires a high-level module processor to perform data processing, such as filtering, segmenting, and labeling.
Meanwhile, the plurality of ultrasonic sensors or the stereo camera system used as described above provide only distance information, which does not assist in determining whether something placed in that distance is an obstacle to be avoided by the movable robot or a desired target (e.g., a person) to be served by the movable robot.
A conventional movable robot, for example, must generate a map in advance using ultrasonic sensor distance information and compare current distance information with the map to discover the person using the difference, or must perform additional computation to analyze an original image of the camera and discover a person. This process makes calculation in a high-level module processor very complex. In particular, there are calculation methods for discovering a person from an original image including a method for discovering an area similar to a person's face within an image, and a method for discovering a person based on a motion.
Among them, the method for discovering a person's face within the image includes discovering a similar area to the face and comparing a pattern of the area with that of other objects for exclusion. When a typical public library is used, even a Pentium 4 processor is insufficient to analyze an image having a 320×240 resolution ten times per second. Accordingly, the movable robot is incapable of carrying out such a calculation-intensive function even though it is equipped with an up-to-date PC.
The method for discovering person based on a motion may be classified into a method for deriving a disparity image between adjacent frames, and a method for accumulating a disparity image one reference image and a current frame image. The method for discovering person based on a motion includes deriving a difference between images imaged by one stationary camera at different points of time. An indoor movable robot using this motion-based method is adequately practical because an indoor moving object is something like a person or a pet animal. However, a disparity image derived from an original image taken by the camera on a pixel basis is affected by a noise in the image, which requires an additional calculation process and in turn, consumes many resources of a high-level module processor, which may be used for other tasks of the movable robot.
Meanwhile, a vision-based navigation for recognizing obstacles and paths based on images has been studied, in addition to image-based person recognition. However, the vision-based navigation also consumes many resources of a high-level module processor when obtaining a scale invariant feature in a pixel data area.
As described above, processing an image in the movable robot consumes many resources, such as resources of the high-level module processor (e.g., a CPU of a main controller) and power. Accordingly, a concept of a network-based robot has been recently introduced. Complex image processing calculation is performed by a networked high-performance remote server, and a movable robot only sends images via a wireless network and receives only the results.
The network-based movable robot must send images via a network. A typical robot image analysis necessitates a color image having a resolution of horizontal 320 pixels×vertical 240 pixels or more, user motion recognition necessitates a frame rate of 15 frames or more per second, and image-based navigation necessitates a higher frame rate, increasing a transfer amount of image data.
However, it is virtually impossible for the movable robot to send original large data as it is because the movable robot must use a less reliable wireless network than a wired network. In order to perform a high-level image recognition function, the network-based movable robot compresses image data and transmits the compressed image data, and a remote server decompresses the compressed image data and, thereby, restores and processes the image. Here, the image compression uses a video format in which a difference with a previous frame is coded in a block unit of 16×16 pixels or 8×8 pixels, such as MPEG4, H.263, or H.264. A video hardware encoder may be a typical video encoder chip or hardware logic embedded in a processor.