1. Field
This disclosure is generally related to an imaging model and an apparatus and in particular to a foveation-related imaging model and a system based on foveation-related imaging.
2. Description of Related Art
A typical human retina-like image sensor, that is, a fovea vision sensor, is applicable to several uses. Such a space-variant image sensor realizes facilitates observing a wider field-of-view (FOV) having a much smaller number of data, and observing the central FOV in more detail than other parts of the FOV. Log-polar (LP) mapping is used as a typical model for image representation. This mapping is inspired by analytic formulation from biological observation of the primate visual system. This mapping has been applied to computer vision computationally to produce an LP vision chip having CCD or CMOS technologies. The LP mapping is effective not only for a significant image data reduction, as the human retina does, but also suitable for generating geometrical rotation and scale-invariant feature because of mathematical property of LP mapping.
Another method to acquire the log-polar image, being an optical approach also exists. This approach usually combines the specially-made Wide Angle Foveated (WAF) lens with a commercially available conventional Cartesian vision chip, where photosensitive element is size-invariant, although the LP chip approach combines the specially-made chip with logarithmic size-variant photosensitive elements having a conventional lens. The optical approach can realize a more complicated combination of different coordinate systems more easily than the specially-made chip. The WAF lens can provide higher resolution in the central FOV because of its optical magnification factor (M.F.).
A camera's view direction control is quite essential for the fovea vision system suggesting to take account of overt-attention, that is, a type of attention when the camera is dynamically moved. Another type is covert-attention, that is, attention when the camera is statically fixed. A rotation, scale, and translation-invariant property is applicable for pattern recognition. Fourier-Mellin Transform is known as an algorithm to generate such a property. However, generally, a Cartesian image is not rotation- and scale-invariant, and an LP image is not translation-invariant, that is, translation causes geometrical deformation of projection in the LP coordinates. An overt-vision system with the fovea vision sensor can combine such two types of image for a reliable pattern recognition, because precise camera view direction control to a target, using the Cartesian image, reduces the distortion in the LP image. In addition, if the FOV is represented by a spherical projection, it is useful for the camera's view direction control.
An LP image acquired by the space-variant fovea sensors, is transformed into a system of Cartesian coordinates, generating a Cartesian image, because Fourier-Mellin Transform needs Cartesian coordinates in order to extract rotation, scale and translation-invariant features. However, this does not mean that the Cartesian image is more suitable as a representation of an original input image, because the Cartesian image remapped conversely from the space-variant input image has a higher resolution in its central FOV than that in the opposite case, i.e., from the Cartesian image to the LP image.
A WAF lens input image is shown in FIG. 1 comparing with the pinhole camera (PHC) lens. This PHC image has the same FOV, that is, the same view angle and the same number of pixels. The WAF lens has about 120 degrees wide FOV and adequate high resolution in the central FOV.
FIG. 2 illustrates plots related to WAF, LP, FE, and PHC lenses. FIG. 2(a) shows object height h vs. image height r. FIG. 2(b) shows object height h vs. magnification factor (M.F.) dr/dh. FIG. 2(c) shows object height h vs. M.F. r/h.
FIGS. 3(a-e) illustrate an original image, and images from a simulation of a WAF lens, an LP lens, an FE lens, and a PHC lens in that order.
FIG. 4 illustrates various plots for prior-art FE, PHC, and Kuniyoshi lenses. FIG. 4(a) shows object height h vs. image height r. FIG. 4(b) shows object height h vs. M.F. dr/dh. FIG. 4(c) shows object height h vs. M.F. r/h.
FIGS. 5(a-c) show exemplary test images under three scalings of 1.0, 0.75, and 1.5. FIGS. 5(d) and (f) are the LP lens image, and a Kuniyoshi lens (K lens) image simulated by the distribution of M.F. of the actual K lens. Both are extracted from FIG. 5(a) in conditions of θmax=π/3, hmax=1, h0=0.026, and h1=0.21.
FIG. 6 shows LP images from an LP lens (left) and a K lens (right) under scaling of 0.75, 1.0, and 1.5 when θmax=π/3, hmax=1, h0=0.026, and h1=0.21.
FIG. 7 shows plots of object height h vs. length on image plane for LP and K lenses illustrating an accuracy of scale-invariance by prior-art lenses such as LP lens, and K lens. A broken line, and a bold solid line show the LP lens, and the K lens, respectively.
Vision sensors such as a CCD camera can acquire more information than other sensors. Further, wide-angle is more convenient to multi-functional use of visual information to make it possible that mobile objects, e.g., automobiles and mobile robots, perform flexibly under various environments. Typical industrial applications are limited to single-functional use by a conventional narrow-angle vision sensor, e.g., an inspection system, and a medical application. Generally, there is a trade-off between a wide-angle and a high resolution. A wide-angle and a high resolution at the same time normally causes an enormous increment of the number of pixels per an image posing a serious problem for data transmitting and real-time processing.
It may be helpful to use a foveated visual information based on human visual property. A human eye has a 120 degree wide-angle visual field. The visual acuity is more near a fovea in the central area of retina and becomes lower towards a peripheral area. Methods to reconstruct a foveated image based on log-polar mapping by a computer, and to obtain it using a space variant scan CCD exist. A camera's view direction is controlled to acquire target information in detail at attention point. This system is called Wide Angle Foveated Vision Sensor (WAFVS) system.
The WAFVS system is composed of image input part, view direction control (VDC) part, and image processing part. Image input part has 2 CCD cameras with a special super wide-angle lens and image capture device. The special lens, named WAF lens (FIG. 8), plays a major role in this part. This lens is attached to a commercially available CCD camera and optically realizes a WAF image with 120 degrees wide visual field and local high resolution in its central area at the same time. Stereo vision with such input image provides 3D information with adequate accuracy, and further wider view area at the same time. FIG. 9 shows characteristics of left camera's image height rL on CCD image plane versus incident angle θL to WAF lens. For a comparison, image height rperL of the PHC lens with the same visual field and the same amount of information are also shown. FIG. 1 shows the images by a WAF lens and a PHC. These curved lines are represented as Equ. (1) (WAF lens) and Equ. (2) (PHC lens). The inclination of each curve shows image resolution along a radial direction of the visual field. A typical WAF lens has a higher resolution in its central area and, on the other hand, has lower resolution in the peripheral area, compared to that of the PHC lens.
                              r          i                =                                            f              0              i                        ⁢                          θ              i              3                                +                                    f              1              i                        ⁢                          θ              i              2                                +                                    f              2              i                        ⁢                                          θ                i                            ⁢                                                          [              pixels              ]                                                          (        1        )                                                      r            per            i                    =                                    r              max              i                        ⁢                                                            tan                  ⁢                                                                          ⁢                                      θ                    i                                                                    3                                            ⁢                                                          [              pixels              ]                                      ,                            (        2        )            where each fki (k=0, 1, 2) is a coefficient determined by camera calibration, and rmaxi is image height of 60 degrees by Equ. (1). Subscript i means left camera or right camera.
Input image from WAF lens is suitable for multi-purpose and multi-functional use, because it has two different kinds of characteristic, i.e., wide-angle and local high resolution. White and black circles in each of FIG. 1(a) and (b) show incident angles with 10 and 30 degrees. The peripheral area of WAF image 30 to 60 degrees is about 40% of the whole visual field, compared to that of PHC lens image with about 90%. This area with less pixels facilitates peripheral vision, e.g., detecting an intruder and localization, and so on. On the other hand, the central area (0 to 10 degrees) of WAF image is about 10% compared to that of pinhole camera image with about 1%. This area has adequate high resolution for central vision, e.g., recognizing objects based on color, shape and pattern, acquiring more accurate 3D information and so on. The intermediate area (10 to 30 degrees) is for both central and peripheral visions.
Camera's view direction control (VDC) to change an attention point in view area rapidly using camera mover is effective for a WAF image. VDC part has VDC mechanism and four stepping motors. These motors realize neck pan rotation and left and right two cameras' tilt and vergence by two kinds of eye movement such as human saccade (rapidly) and pursuit (precisely).
Image processing part is composed of multiple computers having multi-task operating system (OS) under wireless and wired LAN. This part is characterized by flexible parallel image processing function based on timely task distributing (FIG. 10). This function has been investigated to carry out various image processing tasks in a parallel and cooperative manner. Several kinds of image processing with various levels are executed in parallel, in a distributed manner or in a selective manner, based on each processor's load. The main computer plays a role of a file server and has shared information among the computers and among the multiple tasks. Combination with camera's VDC extends application of the WAFVS system. Instead of mobile robot navigation, it seems to be effective for multi-functional application such as object tracking and simultaneous recognition.
A rational mobile robot navigation based on multi-functional use of WAF image exists as shown in FIG. 11(a). The navigation is based on two tasks of central and peripheral visions to utilize the WAF lens property. Central vision plays a role to plan an obstacle avoidance course from more accurate 3D information. On the other hand, peripheral vision plays a role to revise the locational information under odometry. The planned course and locational information are shared between the tasks and are updated periodically to improve quality of the navigation cooperatively. For example, the revised locational information improves the planned course, as regards a target point on the planned course, and objective moving distance and rotating angle are calculated. The calculated values are input to a computer for driving control of mobile robot. FIG. 11(b) shows a flow chart of this navigation. The period of peripheral vision is much shorter than that of central vision. This set-up is based on a model similar to human visual information processing, because peripheral vision is connected to mobile robot control more closely.
FIG. 12(a) shows the visual point coordinate system Oi-XiYiZi where the camera's optical axis coincides at Yi axis and the origin is the visual point of WAF lens (i=L, R). Coordinates (ui′, vi′) on an input image are corrected to visual field coordinates (ui, vi) by dot aspect ratio K and correspond to (θi, φi), incident direction to the visual point, where (Iui, Ivi) is the image center. (uperi, vperi) are perspective coordinates transformed from visual point coordinates using Equ. (2).
FIG. 12(b) shows left camera's visual point coordinate system OL-XLYLZL, robot coordinate system Oc-XcYcZc and world coordinate system Ow-XwYwZw. It is assumed that road plane is ideally flat. In FIG. 12(b), ψ1c and ψ2c are deflection angles of binocular camera mover to the robot in pan and tilt directions respectively. The origin, Oc, of robot coordinates is the robot center and the world coordinates are (Xrobo, Yrobo, 0). P is a length from neck rotation axis to the robot center, H is height of camera's visual point from road plane and B is base line length between left and right camera's visual points, α is an angle between the Yc axis and the Yw axis.
Obstacle avoidance course is determined using 3D information obtained by passive parallel stereo method as a task of central vision. Determination of avoidance course is based on road map. Road map has two dimensional Xw and Yw axes of world coordinate system, and has environmental information such as walls, road boundary lines, detected obstacles and so on. This road map is different from that often used in path planning research such as a Voronoi graph. FIG. 13(i) shows the way to presume an area where obstacles exist. Here a camera's view direction is assumed to be parallel to Yw axis. The steps involved are:
(a): Road plane is divided to small square blocks with a side of 5 [cm],
(b), (c): Each 3D measured point (x, y, z) is voted to the corresponding blocks considering measurement errors caused by CCD element digitization. Δy and Δx, errors in the directions of Yw (YL) axis and Xw (XL) axis respectively are calculated using Equs. (3) and (4).
                              Δ          ⁢                                          ⁢          y                =                                                                                            ∂                  y                                                  ∂                                      θ                    L                                                                                      ·                          Δθ              L                                +                                                                                    ∂                  y                                                  ∂                                      θ                    R                                                                                      ·                          Δθ              R                                +                                                                                    ∂                  y                                                  ∂                                      ϕ                    L                                                                                      ·                          Δϕ              L                                +                                                                                    ∂                  y                                                  ∂                                      ϕ                    R                                                                                      ·                          Δϕ              R                                                          (        3        )                                                      Δ            ⁢                                                  ⁢            x                    =                                                                                                          ∂                    x                                                        ∂                                          θ                      L                                                                                                  ·                              Δθ                L                                      +                                                                                                ∂                    x                                                        ∂                                          θ                      R                                                                                                  ·                              Δθ                R                                      +                                                                                                ∂                    x                                                        ∂                                          ϕ                      L                                                                                                  ·                              Δϕ                L                                      +                                                                                                ∂                    x                                                        ∂                                          ϕ                      R                                                                                                  ·                              Δϕ                R                                                    ,                            (        4        )            where Δθi and Δφi (i=L,R) are errors of incident angle in the radial and tangential directions of visual field respectively, caused by CCD element digitization. FIG. 13(ii) shows the above errors,and (d): The obstacle is presumed to exist in highlighted blocks based on a threshold.
FIG. 14(i) shows a flow to determine avoidance course on road map. The hatched area in each step shows that there are no obstacles. Obstacle information is given to road map with offset to avoid collision. White circles are data points on the determined avoidance course. Information of road boundary lines is acquired by algorithm of peripheral vision described in the next subsection.
FIG. 14(ii) shows contour graphs of depth error Δy of each point on the plane including 2 camera's view lines. Here base line length B is 300 [mm]. The error is calculated based on Equs. (3) and (4) by computer simulation. Each value is represented as a ratio Δy/y. For a comparison, depth error by PHC lens image represented by Equ. (2), is shown. Broken lines are boundary lines of camera's view field, and the area inside them has binocular information. As shown in FIG. 14(ii), depth may be measured with higher accuracy in small incident angles to left camera by a WAF lens than by a PHC lens. PHC lens cannot measure depth within 2% error only inside the near range of about 0.6 m ahead. On the other hand, WAF lens can measure depth with the similar accuracy in the farther range of about 3.2 m ahead.
A method exists to obtain location and orientation from a single CCD camera using two road boundary lines projected in the peripheral area, as Two Parallel Line (TPL) algorithm described in FIGS. 15(a), 15(b), and 16. This method realizes to detect locational information with a higher accuracy, because the peripheral area has a higher resolution in tangential direction while having fewer pixels. Because a length of black circle with 30 degree in WAF image (FIG. 1(a)) is longer than that of FIG. 1(b). It is assumed that there are two parallel boundary lines (lA and lB) on flat road plane and there is no rotation about the optical axis of the camera. In addition, road width Wand height H of visual point from the road plane, are assumed to be known.
The left camera's visual point OL (XoL,ZoL) is calculated from two planes including each boundary line from coordinate system OL-XLYLZL and a related coordinate system OL-XL′YL′ZL′ which has the same origin as shown in FIG. 15(b). As to the OL-XL′YL′ZL′, YL′ axis is parallel to two boundary lines and XL′ axis is horizontal to road plane. These planes are represented as Equ. (5).
                              Z          ′                =                              -                          a              i                                ⁢                      X            ′                    ⁢                                          ⁢                      (                                          i                =                A                            ,              B                        )                                              (        5        )                                                      O            L                    ⁡                      (                                          X                o                L                            ,                              Z                o                L                                      )                          ⁢                                  ⁢        is        ⁢                                  ⁢        calculating        ⁢                                  ⁢        using        ⁢                                  ⁢                              a            i                    .                                          ⁢                      {                                                                                                      X                      o                      L                                        =                                                                                            Wa                          A                                                                                                      a                            A                                                    -                                                      a                            B                                                                                              -                                              W                        /                        2                                                                                                                                                                                    Z                      o                      L                                        =                                                                                            Wa                          A                                                ⁢                                                  a                          B                                                                                                                      a                          A                                                -                                                  a                          B                                                                                                                                                                            (        6        )            
Road width W is calculated from Equ. (7), because visual point height H(=ZoL) is known. This means that it is possible to navigate a mobile robot in an unknown corridor environment as well.W=H/aA−H/aB  (7)
Camera's view direction relative to boundary lines, represented by pan angle ψ1 and tilt angle ψ2, is calculated from the vanishing point in perspective coordinate system. Orientation α of a mobile robot in world coordinate system is represented as Equ. (8), when camera mover is deflected with pan angle ψ1c from robot coordinate system.α=ψ1−ψ1c  (8)
FIG. 16 shows a flowchart of the TPL algorithm using Hough transform. This algorithm detects locational information rapidly, when parts of boundary lines are invisible by obstruction or lighting conditions.
If camera mover is fixed at the mobile robot, accuracy of the locational and 3D information ahead the road is reduced, as orientation of the mobile robot gets larger. Camera's view direction control solves this problem by keeping a view direction parallel to road boundary lines. Rotating angle Δψ1 is calculated from Equ. (9).Δψ1=−ψ1−({circumflex over (α)}2−{circumflex over (α)}1)  (9),where {circumflex over (α)}1 is an estimated value of mobile robot orientation just after image input, based on odometry, and {circumflex over (α)}2 is an estimated value just after locational information is calculated from the peripheral area.
Locational information of the mobile robot is revised periodically with α−({circumflex over (α)}2−{circumflex over (α)}1) and Xrobo−({circumflex over (X)}robo2−{circumflex over (X)}robo1), where {circumflex over (X)}roboi is a value of mobile robot location by odometry just after image input (i=1) and just after locational information is calculated (i=2). If a road width calculated from Equ. (7) is much different with the known width W, locational information is not revised.
When using the TPL algorithm, the optimal value of height h of camera's visual point to detect camera's horizontal position w accurately exists. The relation between w and h is examined by computer simulation, when camera's view direction is parallel to road boundary lines. FIG. 17 shows two projected road boundary lines (represented by φn (n=A,B)) in visual field coordinate system. Measured error, Δw, of horizontal position caused by CCD digitization errors, ΔφA and ΔφB, is calculated from Equs. (10) and (12).
                    {                                                                                                  Δϕ                    n                                    =                                      1                                          r                      ⁢                                                                                                cos                          ⁢                                                                                                          ⁢                                                      ϕ                            n                                                                                                                                                                                                                                        (                                                                                            -                          π                                                ≤                                                  ϕ                          n                                                ≤                                                                              -                                                          3                              4                                                                                ⁢                          π                                                                    ,                                                                                                    -                                                          1                              4                                                                                ⁢                          π                                                ≤                                                  ϕ                          n                                                ≤                                                                              1                            4                                                    ⁢                          π                                                                    ,                                                                                                    3                            4                                                    ⁢                          π                                                ≤                                                  ϕ                          n                                                ≤                        π                                                              )                                    ,                                                                                                      1                                      r                    ⁢                                                                                        sin                        ⁢                                                                                                  ⁢                                                  ϕ                          n                                                                                                                                                                                                            (                                                                                                                        -                                                          3                              4                                                                                ⁢                          π                                                ≤                                                  ϕ                          n                                                ≤                                                                              -                                                          1                              4                                                                                ⁢                          π                                                                    ,                                                                                                    1                            4                                                    ⁢                          π                                                ≤                                                  ϕ                          n                                                ≤                                                  3                          4                                                                                      )                                    ,                                                              ⁢                                          ⁢          where          ⁢                                          ⁢          n          ⁢                                          ⁢          is          ⁢                                          ⁢          A          ⁢                                          ⁢          or          ⁢                                          ⁢                      B            .                                              (        10        )                                w        =                              W            ⁢                                                  ⁢            tan            ⁢                                                  ⁢                          ϕ              A                                                          tan              ⁢                                                          ⁢                              ϕ                A                                      -                          tan              ⁢                                                          ⁢                              ϕ                B                                                                        (        11        )                                          Δ          ⁢                                          ⁢          w                =                                                                                            ∂                  w                                                  ∂                                      ϕ                    A                                                                                      ⁢                          Δϕ              1                                +                                                                                    ∂                  w                                                  ∂                                      ϕ                    B                                                                                      ⁢                          Δϕ              B                                                          (        12        )            
FIG. 18 shows contour graphs of error Δw in each position (w/W, h/W). FIG. 18(a) is from the WAF lens and FIG. 18(b) is from the PHC lens. Each value of Δw is represented with percentage of road width W. Δw is calculated from ΔφA and ΔφB on a circle with a radius of 0.875 and those on a circle with a radius of 0.405 as to WAF lens and PHC lens, respectively. These radii correspond to about 35 degree incident angle. There is the height of visual point to make Δw minimum as to each horizontal position w (shown as a broken line in FIG. 18). The optimal height gets smaller, as w gets closer to zero or 1. The error becomes more sensitive to change of was h gets closer to zero and w gets closer to zero or 1. It is noted that the WAF lens can measure w with a higher accuracy than the PHC lens, because the resolution is higher in tangential direction of the same incident angle.
A mobile robot used for an experiment is shown in FIG. 19. It is a Front Wheel System vehicle with an active front wheel, for both driving and steering, and two rear wheels. On the robot, there are two computers for image processing and driving control of mobile robot, which run each process in parallel and share locational information of the mobile robot, by communication. Locational information is estimated from rotations measured by rotary encoders for driving and steering, and is revised by values detected from peripheral vision to improve quality of the planned course and navigation.
As shown in FIGS. 20(i)(a) and (b), two kinds of navigation experiment are carried out, where white boards (width 20 [cm]×height 55 [cm]) are placed as obstacles on a carpet (width 137.5 [cm]). Tilt angle ψ2 of left camera's view direction, is 0 degrees, and height H of the camera's visual point is 64.5 [cm], and horizontal distance P between the robot center and neck rotation axis of camera mover is 53 [cm]. Collision offset between obstacles and the mobile robot is set with 20 [cm]. The mobile robot is set to move with velocity of about 10 [cm/s].
FIG. 20(ii) shows results of the experiment. White triangles are target points on the planned courses, and black dots are points by (Xrobo, Ŷrobo) on actual courses, where Xrobo is a measured value by the TPL algorithm and Ŷrobo is an estimated value by odometry. Crosses are obstacles. They are plotted with world coordinates respectively. In these experiments, steering is controlled to follow trajectories based on cubic function fitted to target points. It seems that the gap between two courses is caused by delay of the steering control. A target point close to 300 mm in FIG. 20(ii)(b) is influenced by errors of locational information from the TPL algorithm.
A fovea sensor gives a foveated image having a resolution that is higher in the central FOV and decreases rapidly as going from a central area to the periphery. That is, the resolution of the fovea sensor is space-variant. Thus, the fovea sensor functions by wide-angle FOV and in detail in the central FOV using largely-reduced number of data. Log-polar mapping is often used for a model of the foveated image. The log-polar mapping is inspired by analytic formulation from biological observation of the primate visual system. Log-polar is applied this to computer vision computationally and to produce a log-polar vision chip with CCD or CMOS technologies. The log-polar mapping is not only effective for a drastic reduction in image data, as the human retina does, but is also suitable for generating geometrical rotation and scale-invariant feature easily.
Another method and a wide-angle lens exist to acquire the foveated image. Such a wide-angle lens combines a specially-made Wide Angle Foveated (WAF) lens with a commercially available conventional Cartesian linear-coordinate vision chip, where photosensitive elements are arranged uniformly. On the other hand, the former approach combines a conventional lens with the log-polar chip, where the size of photosensitive element is uniform in the fovea and changes logarithmically in periphery.
A special wide-angle lens is known as a model that combines planar projection and spherical projection. This lens achieves foveation by distorting a part of spherical projection using a logarithmic curve in order to bridge ‘linear’ planar projection and ‘linear’ spherical projection. This part of the FOV, that is, spherical logarithmic part, has rotation- and scale-invariant (RS-invariant) property.