The invention relates generally systems that use an array of sensors to detect distances in at least two dimensions, and more specifically to enhancing performance of such system by reducing computational overhead, overcoming geometric error, elliptical error, and lens distortion.
FIG. 1 depicts a generic three-dimensional sensing system 10 that includes a pulsed light source 20 some of whose emissions 30 strike a target object 40 and are reflected back as optical energy 50. Some of the reflected energy 50 passes through a lens 60 and is collected by at least some three-dimensional sensors 70i,j in a sensor array 80, where i,j represent indices. An electronics system 90 coordinates operation of system 10 and carries out signal processing of sensor-received data. An exemplary such system is described in U.S. patent application Ser. No. 09/401,059 xe2x80x9cCMOS-Compatible Three-dimensional Image Sensor ICxe2x80x9d, now U.S. Pat. No. 6,323,942(2001).
Within array 80, each imaging sensor 70i,j (and its associated electronics) calculates total time of flight (TOF) from when a light pulse left source 20 to when energy reflected from target object 40 is detected by sensor 80. Surface regions of target 40 are typically identified in (x,y,z) coordinates. Different (x,y,z) regions of target object 40 will be imaged by different ones of the imaging sensors 70ij in array 80. Data representing TOF and/or brightness of the returned optical energy is collected by each senor element (i,j) in the array, and may be referred to as the data at sensor pixel detector (i,j). Typically, for each pulse of optical energy emitted by source 20, a frame of three-dimensional image data may be collected by system 10.
FIG. 2 depicts one potential application for system 10, in which system 10 attempts to detect the spatial location of interaction between a virtual input device 100 (here shown as a virtual keyboard) and a user-controlled object 110 (here shown as a user""s hand). Virtual input device 100 may be simply an image of an input device, such as a keyboard. As the user 110 xe2x80x9ctypesxe2x80x9d on the image, system 10 attempts to discern in the (x,y,z) coordinate system which keys on an actual keyboard would have been typed upon by the user.
Typically an image representing the surface of virtual input device 110 will have been stored in (x,y,z) coordinates in memory within system 10. For example in FIG. 2, the user""s left forefinger is shown contacting (or typing upon) the region of the virtual input device where the xe2x80x9cALTxe2x80x9d character would be located on an actual keyboard. In essence, regions of contact or at least near contact between user-controlled object 110 and the virtual input device 110 are determined using TOF information. Pixel detector information from sensor array 80 would then be translated to (x,y,z) coordinates, typically on a per-frame of data acquired basis. After then determining what region of device 110 was contacted, the resultant data (e.g., here the key scancode for the ALT key) would be output, if desired, as DATA to an accessory device, perhaps a small computer. An example of such an application as shown in FIG. 2 may be found in co-pending U.S. patent application Ser. No. 09/502,499 entitled xe2x80x9cCMOS-Compatible Three-dimensional Image Sensor ICxe2x80x9d, assigned to assignee herein.
Unfortunately several error mechanisms are at work in the simplified system of FIG. 2. For example, geometric error or distortion is present in the raw data acquired by the sensor array. Referring to FIGS. 3A and 3B, geometric or distortion error arises from use of distance measurement D at pixel (i,j) as the z-value at pixel (i,j). It is understood that the z-value is distance along the z-axis from the target object 40 or 110 to the optical plane of the imaging sensor array 80. It is known in the art to try to compensate for geometric error, by transforming the raw data into (x,y,z) coordinates using a coordinate transformation that is carried out on a per-pixel basis. Such coordinate transformation is a transformation from one coordinate system into another coordinate system.
FIG. 3C depicts another and potentially more serious geometric error, namely so-called elliptical error. Elliptical error results from approximating imaging regions of interest as lying on planes orthogonal to an optical axis of system 10, rather than lying on surfaces of ellipsoids whose focal points are optical emitter 20 and optical sensor 80. Elliptical error is depicted in FIG. 3C with reference to points A and point B, which are equal light travel distances from optical energy emitter 20 shown in FIG. 1. Referring to FIG. 3C, optical source 20 and sensor array 80 are spaced-apart vertically (in the figure) a distance 2c. Further, points A and point B each have the same light traveling distance 2d, e.g., r1+r2=2d, and rxe2x80x21+rxe2x80x22=2d. In mapping distance values to planes in a three-dimensional grid, points A and B, which have the same distance value from the optical plane, may in fact map to different planes on the three-dimensional grid. Thus while points A and B both lie on the same elliptical curve Ec, point A lies on plane Pa while point B lies on a parallel plane Pb, a bit farther from the optical plane than is plane Pa. Thus to properly determine (x,y,z) coordinate information for point A and point B requires a further correction.
Unfortunately, computational overhead or cost associated with various coordinate transformations and other corrections may be high. For example assume that array 80 includes 100 rows and 100 columns of pixel detectors 70i,j (e.g., 1xe2x89xa6ixe2x89xa6100, 1xe2x89xa6jxe2x89xa6100). Thus, a single frame of three-dimension data acquired for each pulse of energy from emitter 20 includes information from 10,000 pixels. In this example, correcting for geometric or distortion error requires performing 10,000 coordinate transformations for each frame of data acquired. If the frame rate is 30 frames per second, the computational requirement just for the coordinate transformations will be 300,000 coordinate transformations performed within each second.
In addition to the sheer number of transformations required to be calculated per second, coordinate transformation typically involves use of floating-point calculation and/or memory to store transformation tables. Thus, the necessity to perform a substantial number of coordinate transformations can be computationally intensive and can require substantial memory resources. However in applications where system 10 is embedded system, the available computational power and available memory may be quite low. But even if the overhead associated with increased computational power and memory is provided to carry-out coordinate transformation, correction to geometric error does not correct for distortion created by lens 60.
Lens distortion is present on almost every optical lens, and is more evident on less expensive lens. Indeed, if system 10 is mass produced and lens 20 is not a high quality lens, the problem associated with lens distortion cannot generally be ignored. FIG. 4A depicts a cross-hatch image comprising parallel and vertical lines. FIG. 4B depicts the image of FIG. 4A as viewed through a lens having substantial barrel distortion, while FIG. 4C depicts the image of FIG. 4A as viewed through a lens having substantial pincushion distortion. Barrel distortion and pin cushion distortion are two common types of lens distortion. An additional type of lens distortion is fuzziness, e.g., imaged parallel lines may not necessary be distorted to bow out (FIG. 4B) or bow in (FIG. 4C), yet the resultant image is not optically sharp but somewhat fuzzy.
It is known in the art to correct non-linear lens distortion such as barrel and pincushion lens distortion using non-linear numerical transformation methods that are carried out on a per-pixel basis. While such transformation can indeed compensate for such non-linear lens distortion, the computational overhead cost can be substantial. Further, in an embedded application characterized by low computational power, the ability to correct for these two types of lens distortion may simply not be available. (Correction for fuzziness lens distortion is not addressed by the present invention.)
Thus, for use with a system having an array of detectors, defined in (i,j,k) coordinate space, to acquire at least two-dimensional information representing user-controlled object interaction with a virtual input device, traditionally represented in (x,y,z) coordinate space, there is a need for a new method of analysis. Preferably such method should examine regions of the virtual input device and statically transforms sub-regions of potential interest into (i,j,k) detector array coordinates. Determination as to what regions or sub-regions of the virtual input device have been interacted with by a user-controlled object may then advantageously be carried out in (i,j,k) domain space.
Further, there is a need for a method to reduce computational overhead associated with correction of geometric error, non-linear barrel and pincushion type lens distortion, and elliptical error in such a system that acquires at least two-dimensional data. Preferably such method should be straightforward in its implementation and should not substantially contribute to the cost or complexity of the overall system.
The present invention provides such a method.
The present invention provides a methodology to simplify operation and analysis overhead in a system that acquires at least two-dimensional information using a lens and an array of detectors. The information acquired represents interaction of a user-controlled object with a virtual input device that may be represented in conventional (x,y,z) space coordinates. The information is acquired preferably with detectors in an array that may be represented in (i,j,k) array space coordinates.
In one aspect, the invention defines sub-regions, preferably points, within the virtual input device reduces computational overhead and memory associated with correcting for geometric error, elliptical error, and non-linear barrel and pincushion type lens distortion in a system that acquires at least two-dimensional information using a lens and an array of detectors. Geometric error correction is addressed by representing data in the sensor array coordinate system (i,j,k) rather than in the conventional (x,y,z) coordinate system. Thus, data is represented by (i,j,k), where (i,j) identifies pixel (i,j) and k represents distance. Distance k is the distance from an active light source (e.g., a light source emitting optical energy) to the imaged portion of the target object, plus the return distance from the imaged portion of the target object to pixel (i,j). In the absence of an active light source, k is the distance between pixel (i,j) and the imaged portion of the target object.
Advantageously using the sensor coordinate system avoids having to make coordinate transformation, thus reducing computational overhead and cost and memory requirements. Further, since the sensor coordinate system relates to raw data, geometric error correction is not applicable, and no correction for geometric error correction is needed.
In another aspect, the present invention addresses non-linear barrel and pincushion type lens distortion effects by simply directly using distorted coordinates of an object, e.g., a virtual input device, to compensate for such lens distortion. Thus, rather than employ computational intensive techniques to correct for such lens distortion on the image data itself, computation cost is reduced by simply eliminating correction of such lens distortion upon the data. In a virtual keyboard input device application, since the coordinates of all the virtual keys are distorted coordinates, if the distorted image of a user-controlled object, e.g., a fingertip, is in close proximity to the distorted coordinate of a virtual key, the fingertip should be very close to the key. This permits using distorted images to identify key presses by using distorted coordinates of the virtual keys. Thus, using distorted key coordinates permits software associated with the virtual input device keyboard application to overcome non-linear pincushion and barrel type lens distortion without directly compensating for the lens distortion.