1. Field of the Invention
The present invention relates to a data processing device for estimating the pose of a device, a pose estimation system, a pose estimation method, and a program.
2. Description of the Related Art
There are many methods of measuring pitch, roll, and yaw, which are parameters that indicate the attitude of an air vehicle that flies through the air or space or a underwater vehicle that travels through water by a specific coordinate system of the ground surface, ocean surface, or ocean floor, and methods of measuring the height from the ground surface, ocean surface, or ocean floor or depth from the ocean surface, including measurement methods that use pose estimation sensors. Small unmanned air vehicle and underwater vehicle that are seeing increased use in recent years tend to incorporate various sensors that are specially directed to navigation and inspection in addition to pose measurement sensors. In particular, there is an increasing need for mounting cameras, radar, and sonar to obtain images. Sensors that are primarily directed to the capture of such images are referred to as “image sensors.”
However, increase in the variety of mounted sensors results in greater complexity of the air vehicle or underwater vehicle systems (including systems for measuring the above-described parameters). As a result, not only do the design, fabrication, and maintenance of air vehicle or underwater vehicle systems entail considerable time and effort, but there is also a danger of an increase in the frequency of malfunctions. In addition, the air vehicle and underwater vehicle systems become bulkier and heavier, and further, consume more power. It is desirable both from the standpoint of utility as well as from the standpoints of size and energy efficiency to decrease the variety of sensors mounted in air vehicle and underwater vehicle systems.
However, when image acquisition is the object of the air vehicle or underwater vehicle, image sensors are indispensable constituent elements. On the other hand, sensors for pose measurement of the air vehicle or underwater vehicle are also indispensable constituent elements for safe and precise navigation. Nevertheless, image sensors that are directed to the acquisition of images that can also substitute for pose measurement sensors can allow the omission of pose measurement sensors and can be considered effective for decreasing size and weight. In addition, although the widely used pose measurement sensors that use inertia may suffer a severe drop in accuracy in the presence of vibration, this problem can be circumvented if pose can be measured by the image sensors.
In addition, as methods of estimating the pose of an air vehicle or underwater vehicle that appears in images based on the acquired images, various techniques have been proposed for estimating pose from the appearance of known geographical features or landmarks. However, such techniques cannot be applied in locations in which geographical features or landmarks are difficult to recognize such as over a desert or ice field, in clouds or mist, or in water; in locations that lack beacons or precise topographical maps; or in locations in which geographical features have greatly changed or landmarks have been lost due to, for example, a disaster. Under these circumstances, the use of basic information such as the ground surface or ocean surface as a reference is extremely effective when figuring pose parameters.
However, when a substantially flat surface such as the surface of the earth or a surface having a high degree of symmetry such as a spherical surface is the standard, yaw is extremely difficult to find based on the appearance of the surface. Nevertheless, the other principal pose parameters, i.e., altitude or depth, pitch, and roll, can be found. Yaw is not absolutely necessary when only minimum control of pose is necessary such as for avoiding collision with the surface of the earth or floor of the ocean or jumping out of from the ocean surface.
A position on a plane such as the surface of the earth, the ocean surface, or the ocean floor can be uniquely found if three points' positions on the plane are known. In a camera, irradiation of, for example, a laser allows the coordinates of a point on the plane to be found based on the principles of triangulation from the positional relation of the laser and camera. However, under noisy conditions such as at night or during bad weather, the use of only three points does not allow sufficient accuracy. In the case of radar or sonar, high accuracy is obtained by combining the data of as many points as possible due to the high level of noise caused by reflections from the ground surface, ocean surface, or ocean floor. For example, a plane is preferably identified by using the reflections of the entire ground surface in radar and by using the reflections of the entire ocean surface in sonar.
As a method of finding, from a multiplicity of points, a surface that is made up from this multiplicity of points, a method can be considered of first fitting a plane by means of the least squares method. This method, however, tends to be diverted toward values that greatly deviate from the average. In addition, proper fitting cannot be achieved when the ocean surface and ocean floor are simultaneously visible and the reverberations of each intermix such as in a shallow water.
The use of a three-dimensional Hough transform has been proposed as a method of finding a plane even in conditions of a high degree of noise. For easy understanding in the following explanation, a case is described of finding a straight line or curve by means of a Hough transform for a two-dimensional image having a high level of noise as an example.
In a Hough transform, an image (in this example, a two-dimensional image) is first subjected to binarization. The binarized image is next divided between a “candidate region” that is a region in which the existence of a straight line or a curve is predicted and a “background region” that is a region in which straight lines and curves are not predicted to occur. Next, taking as parameters the coefficients of formulas that express the lines that are to be detected, all combinations of parameters are found for all lines that can pass through each point contained in the candidate region. All combinations of parameters are plotted in parameter space that takes the parameters as axes to find straight lines or curves (hereinbelow referred to as “parameter lines”). Parameter lines are drawn for each point of the candidate region, and the points at which many parameter lines intersect are the parameter combinations that are to be detected.
Explanation next regards a case of straight-line detection that is widely used in the field of image processing. In the example that is next described, as shown in FIG. 1, for a straight line on an image, the distance ρ from the origin to any point (x, y) on the straight line can be expressed as ρ=x cos θ+y sin θ using point (x, y) and angle θ formed by a vector having the origin as its starting point and having point (x, y) as an endpoint with respect to the x-axis.
Although there are an enormous number of straight lines that can pass through point (xi, yi) located in a candidate region on an image, this enormous number of straight lines becomes one parameter line in the parameter space that takes the above-described angle θ and distance ρ as axes (the θ-ρ plane) and can be expressed by the curve ρ=xi cos θ+yi sin θ. For example, a case is considered in which points A, B, and C located on one straight line exist within a candidate region as shown in FIG. 2. When the parameter lines that correspond to each of points A, B, and C are detected in this state, the point where the three parameter lines intersect (ρ0, θ0) is the parameter located on the straight line that is to be detected. Therefore, the straight line that is to be detected is expressed by θ0=x cos θ0+y sin θ0 when the above-described relational expression is applied and distance ρ0 and angle θ0 are used.
As disclosed in JP-1995-271978A, JP-1997-081755A, JP-1998-096607A, and JP-2003-271975A, various techniques have been proposed for detecting a plane by applying this method to three dimensions.
However, in the technique described in JP-1995-271978A, JP-1997-081755A, and JP-1998-096607A, once straight lines have been found in a plane, these straight lines are bundled to find the plane. As a result, these methods have a first problem that is the complexity of the processing for finding a plane. In addition, the restrictive conditions of a straight line are weaker than for a plane. As a result, when higher levels of noise components are included, straight lines which are strongly influenced by the noise components are detected, raising the concern of a decrease in processing efficiency.
In the technique disclosed in JP-2003-271975A, surfaces formed by three points to vote are found at first. This case also has a second problem in that the processing for finding a surface is complex. When there is higher levels of noise components, more points is preferably combined together to find a plane. In this case, the concern arises of a marked increase in the amount of processing in the technique disclosed in JP-2003-271975A.
In addition, as a problem common to typical techniques, there is a third problem that, when a plane is found, the transformation of parameters contained in the relational expression for expressing the plane that has been found and of pose parameters (for example, depth, altitude, pitch, and roll) that relate to the object of pose estimation is non-linear, whereby the accuracy of the pose parameters cannot be specified in advance. In other words, even when the accuracy (discretization widths) for the parameters of a formula that describes a plane in the parameter space of a Hough transform are each set identically, the accuracy for each pose parameter differ according to the values of the pose parameters and are not fixed on the same value. For example, when the relational expression that indicates coordinate z in the z direction of a point located on a surface is expressed by z=ax+by+c, using slope “a” for the x direction of the surface in the parameters of the Hough transform results in a nonlinear relation as regards slope “a” and pitch and roll, which are angles, and altitude, which is length. As a result, implementing a Hough transform by setting slope “a” at fixed periods does not result in fixed periods in the pose parameters.
In a Hough transform, a fourth problem exists of a danger that the potential for mistakenly identifying a false surface resulting from noise components as the “true surface” that is the surface to be detected changes (increases or decreases) according to the coordinates in the three-dimensional image. The potential for mistakenly identifying a false surface resulting from noise components as the “true surface” changes according to the coordinates in the three-dimensional image because a group of nearby pixels within a three-dimensional image that originally belong to a different surface are, due to their location within the three-dimensional image, recognized as having the same parameters, resulting in an increase in the number of pixels that are detected, and the increased number of pixels may become greater than the total number of detected pixel groups that belong to the true surface and that are located at other coordinates. To facilitate understanding of this type of phenomenon, a case will be explained in which a Hough transform is performed on a two-dimensional image.
A case will here be considered in which a straight line on a plane is detected by a Hough transform. For example, it is assumed that straight lines that pass through the point (x0, y0) are expressed by ρ=x0 cos θ+y0 sin θ. In this case, a straight line that passes through (x0+Δx, y0+Δy) located in the vicinity of the point (x0, y0) is expressed by ρ+Δρ=(x0+Δx)cos(θ+Δθ)+(y0+Δy) sin(θ+Δθ). Still further, when approximation that ignores expansions relating to Δx, Δy, and Δθ and terms that are secondary or greater is applied to the relational expression that indicates this straight line, the straight line is expressed by Δρ≈(Δx+y0Δθ) cos θ+(Δy−x0Δθ)sin θ. This means that even when Δx, Δy, and Δθ are fixed values, when point (x0, y0) that represents a position on an image differs, Δρ also differs accordingly.
This problem will next be explained by means of a specific example. As shown in FIG. 3a, a case is described in which there are two lines to be detected, line α and line β, point A being detected on line α, and point B and point C being detected on line β. In addition, as shown in FIG. 3b, parameter lines in parameter space that correspond to each of points A, B, and C intersect with each other at different coordinates.
On the other hand, in the example shown in FIG. 4a, the objects of detection are the two lines line α′ and line β′, point A′ being detected on line α′ and point B′ and point C′ being detected on line β′. The relative positions between each of points A′, B′, and C′ shown in FIG. 4a are assumed to be identical to the relative positions between each of points A, B, and C shown in FIG. 3a. In this case, the parameter line that corresponds to point A′, the parameter line that corresponds to point B′, and the parameter line that corresponds to point C′ intersect with each other at the same coordinates, as shown in FIG. 4b. In other words, the fourth problem is that, when the positions of each of the detected points differ, the form of intersection of the parameter lines that correspond to each of the points differs. Although a method has been proposed for circumventing this fourth problem regarding straight lines on a two-dimensional image, a method of circumventing the fourth problem for plane detection in a three-dimensional image has yet to be proposed.
In addition, there is a fifth problem that the problem of changes (increase or decrease) in the potential for mistakenly identifying a false surface resulting from noise components as the “true surface” according to the coordinates in a three-dimensional image is not limited to the case of detecting a plane within a three-dimensional image but can also occur when detecting a typical curved surface within a three-dimensional image.
In a typical method for computing area by counting the number of pixels contained in an image, the area included in the surface will be computed differently when the angle of view changes, even when the same surface is viewed from the object of pose estimation. As a result, the area included in the “true surface” after the angle of view has changed will in some cases be smaller than the threshold value for determining “false surfaces” that result from noise components. In such cases, a sixth problem arises that the “true surface” and a “false surfaces” resulting from noise components cannot be distinguished, resulting in the danger of the inability to detect the “true surface” after a change of angle. Counting the “crossing frequency (the “cross surface number” that will be described hereinbelow)” that indicates the crossing frequency of parameter surfaces in parameter space is equivalent to counting the number of pixels that belong to each surface detected on an image. However, when a surface is counted by pixels, the area of the surface becomes equal to that of the largest case of projecting this surface upon plane x-y, plane y-z, or plane z-x. In the interest of facilitating understanding, a two-dimensional Hough transform is here considered. Explanation here regards the case of a straight line having a length of 8 pixels, which is the length of a straight line on a simple plane rather than the area of a surface in three dimensions.
When the length of pixels aligned in the horizontal direction is taken as a reference, the straight line formed by a series of pixels shown in FIG. 5a and the straight line formed of a series of pixels shown in FIG. 5b have the same length (the length of eight pixels). However, the straight line shown in FIG. 5a has a length of approximately 11.3 pixels according to length measured based on Euclidean distance.
The sixth problem is next explained taking the example of a surface in a three-dimensional image. The example shown in FIG. 6 compares plane A that is tilted by exactly angle θ with respect to plane z-x and plane B that forms an angle of “0” with respect to plane z-x. In this example, each of planes A and B are assumed to have regions in which boundaries are set by each of the sides of rectangles. The measurement of the area of a surface is the measurement of the number of pixels, and when surface A that is tilted by an angle θ is projected onto plane z-x and matches with plane B, the surface of plane A is discerned to be equal to the surface of plane B. However, when the area included in plane A is calculated based on the length in the vertical direction and the length in the horizontal direction as measured by Euclidean distance, the area of plane A is greater than the area of plane B. This application of the latter area measurement method is appropriate both from the standpoint of the visual confirmation of the user and from the standpoint of finding the area based on the actual physical object.
In a typical Hough transform, these standpoints are not taken into consideration. As a result, although the area of a plane that is to be detected in excessive noise differs from the area of a false plane resulting from noise components can be clearly distinguished by means of a threshold value, the angle of view may sometimes cause the area of the “true surface” to fall below this threshold value. In other words, even when a threshold value is provided for distinguishing a “true surface” from “false surfaces”, the danger remains that the orientation of the surface may cause the area of the true plane that is to be detected to fall below the threshold value and thus prevent detection of the true plane.