Recently, demands for navigation of people and robots in unknown environments are increasing. In order to satisfy such demands, it is necessary to autonomously build an environment map and localize those own positions. A problem of the autonomous map building and localization is generally called as SLAM (Simultaneous Localization And Mapping) and has been widely studied. Further, in recent years, studies for applying the SLAM technology to unmanned aerial vehicles and automobiles have been developed.
However, the general SLAM assumes a static environment. In contrast, an actual environment where we live is a dynamic environment where many people and objects come and go. Therefore, when it is attempted to apply the existing SLAM method to the actual environment, there is a problem that moving persons and animals are misrecognized as landmarks so that accuracy of the SLAM deteriorates.
For such autonomous map building and localization, how to accurately extract a feature value of a surrounding environment is a major point. In general, as a method for extracting a local feature value from the environment, there are an Affine invariant feature value (MSER, Harris-Affine, Hessian-Affine, Salient Region, or the like) and feature values invariant to scale change (SIFT: Scale Invariant Feature Transformation, SURF: Speed Up Robustness Features, or the like).
Further, there are PIRF (Position-Invariant Robust Features) that are feature values acquired by further extracting only feature values robust to shooting position change from local feature values extracted from each image in consecutive images by the above-described method (Patent Literatures 1 to 3). Note that autonomous map building using the PIRF have been already proposed (Patent Literature 4). In the PIRF, the feature value is selected under the assumption that there should be a static and stable feature value over several frames between the consecutive images. Here, the PIRF indicate the local feature values such as the SIFT and the SURF that are commonly present between the consecutive images. By using the PIRF, because dynamic feature values that are not common between the consecutive images can be removed, high accuracy can be achieved even in the dynamic environment as compared to a case of simply using the local feature values including the SIFT and the SURF.
Here, a definition of the dynamic feature value will be described in detail. When there are only static objects in the surrounding environment, for example, it can be considered that there is no significant change in a shot object and its position even when consecutive shot images are compared over several frames. Meanwhile, when a moving person or object is shot by a camera, it can be observed that the moving person or object disappears from the image and its position in the image is changed when the person or object is compared over the several frames in the consecutive images. Thus, the moving person or object can be specified by comparing the consecutive images over the several frames. Hereinafter, the local feature value, whose position widely changes when the local feature value acquired from the consecutive images are compared over the several frames, is defined as the dynamic feature value.
In addition, ICGM (Incremental Center of Gravity Matching) has been proposed as a method of extracting the local feature value from the environment (Patent Literature 5). In the ICGM, a dynamic local feature value of a dynamic object and a static and stable local feature value of a static object can be distinguished by defining vectors from a centroid position of a plurality of local feature values to each local feature value and comparing these vectors between the consecutive images.