The present invention relates to a feature descriptor encoding apparatus, a feature descriptor encoding method, and a program.
Systems have been suggested for detecting a large number of interest points (feature points) located in an image and extracting feature descriptors (local feature descriptors) in a local region around each feature point, so that an object in the image could be identified in a manner robust to occlusion and changes of image capturing size and angle. Patent Document 1 and Non-Patent Document 1 disclose local feature descriptor extracting apparatuses using a SIFT (Scale Invariant Feature Transform) feature descriptor as typical such systems. Non-Patent Document 2 discloses a local feature descriptor extracting apparatus using a SURF (Speeded Up Robust Features) feature descriptor as another such system.
In those local feature descriptor extracting apparatuses, the coordinate values of feature points and the extracted local feature descriptors are outputted with respect to each of a plurality of feature points detected from an image. Thus, a set of coordinate values and local feature descriptors is taken as a feature descriptor representing the entire image. Both the coordinate values and the local feature descriptors of a plurality of feature points are used to match the images.
Encoding is typically performed when saving and transmitting those local feature descriptors. FIG. 7 shows an example of a typical configuration of a feature descriptor encoding apparatus that encodes local feature descriptors. As shown in FIG. 7, the feature descriptor encoding apparatus is provided with a feature point detection unit 200, a local feature descriptor extracting unit 210, a local feature descriptor encoding unit 220, and a coordinate value fixed-length encoding unit 230.
The feature point detection unit 200 detects a large number of interest points (feature points) from an image and outputs coordinate values of the feature points. Two coordinate values, namely, an X coordinate value and an Y coordinate value, are outputted. The local feature descriptor extracting unit 210 extracts feature descriptors from a local region centered on each coordinate value by using the coordinate values of the detected feature points and outputs the extracted feature descriptors as local feature descriptors. For example, when the above-mentioned SIFT feature descriptors are used, the local feature descriptor extracting unit 210 can divide a local region into 4×4=16 blocks, generate a gradient direction histogram quantized in 8 directions with respect to each block, and take a gradient direction histogram of 128 dimensions=16 blocks×8 gradient directions as local feature descriptors. The local feature descriptor encoding unit 220 encodes the extracted local feature descriptors. For example, when the above-mentioned SIFT feature descriptor is used, the local feature descriptor encoding unit 220 can encode the 128-dimensional feature descriptor corresponding to one feature point, for example, with a total of 128 bytes by encoding one dimension with 1 byte. The coordinate value fixed-length encoding unit 230 encodes the coordinate values of each feature point with a fixed bit length. The coordinate value includes an X coordinate value and an Y coordinate value. The coordinate value is typically extracted as a floating-point number and, therefore, represented, for example, as 4-byte or 8-byte information. For this reason, the coordinate value fixed-length encoding unit 230 encodes the X coordinate value and Y coordinate value corresponding to one feature point with 8 bytes (4 bytes×2) or 16 bytes (8 bytes×2). The encoded local feature descriptors outputted by the local feature descriptor encoding unit 220 and the encoded coordinate values outputted by the coordinate value fixed-length encoding unit 230 are together taken as encoded feature descriptors.
Patent Document 1: U.S. Pat. No. 6,711,293
Non-Patent Document 1: David G. Lowe, “Distinctive image features from scale-invariant keypoints” (USA), International Journal of Computer Vision, 60(2), 2004, p. 91-110
Non-Patent Document 2: Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool “SURF: Speeded Up Robust Features” (USA), Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, 2008, p. 346-359