This Application is related to the subject matter described in the following two U.S. Pat. No. 5,578,813 filed Mar. 2, 1995, issued Nov. 26, 1996 and entitled FREEHAND IMAGE SCANNING DEVICE WHICH COMPENSATES FOR NON-LINEAR MOVEMENT; and U.S. Pat. No. 5,644,139, filed Aug. 14, 1996, issued Jul. 1, 1997 and entitled NAVIGATION FOR DETECTING MOVEMENT OF NAVIGATION SENSORS RELATIVE TO AN OBJECT. Both of these Patents have the same inventors: Ross R. Allen, David Beard, Mark T. Smith and Barclay J. Tullis, and both Patents are assigned to Hewlett-Packard Co. This application is also related to the subject matter described in U.S. Pat. No. 5,786,804 filed Oct. 6, 1995, entitled METHOD AND SYSTEM FOR TRACKING ATTITUDE, issued Jul. 28, 1998, and also assigned to Hewlett-Packard Co. These three Patents describe techniques of tracking position movement. Those techniques are a component in the preferred embodiment described below. Accordingly, U.S. Pat. Nos. 5,578,813, 5,644,139 and 5,786,804 are hereby incorporated herein by reference.
The use of a hand operated pointing device for use with a computer and its display has become almost universal. By far the most popular of the various devices is the conventional (mechanical) mouse. A conventional mouse typically has a bottom surface carrying three or more downward projecting pads of a low friction material that raise the bottom surface a short distance above the work surface of a cooperating mouse pad. Centrally located within the bottom surface of the mouse is a hole through which a portion of the underside of a rubber-surfaced steel ball (hereinafter called simply a rubber ball) extends; in operation gravity pulls the ball downward and against the top surface of the mouse pad. The mouse pad is typically a closed cell foam rubber pad covered with a suitable fabric. The low friction pads slide easily over the fabric, but the rubber ball does not skid, but instead rolls as the mouse is moved. Interior to the mouse are rollers, or wheels, that contact the ball at its equator (the great circle parallel to the bottom surface of the mouse) and convert its rotation into electrical signals. The external housing of the mouse is shaped such that when it is covered by the user""s hand it appears to have a xe2x80x9cfront-to-backxe2x80x9d axis (along the user""s forearm) and an orthogonal xe2x80x9cleft-to-rightxe2x80x9d axis. The interior wheels that contact the ball""s equator are arranged so that one wheel responds only to rolling of the ball that results from a motion component of the mouse that is along the front-to-back axis, and also so that the other wheel responds only to rolling produced by a motion component along the left-to-right axis. The resulting rotations of the wheels or contact rollers produce electrical signals representing these motion components. (Say, F/B representing Forward and Backward, and L/R representing Left or Right.) These electrical signals F/B and L/R are coupled to the computer, where software responds to the signals to change by a xcex94x and a xcex94y the displayed position of a pointer (cursor) in accordance with movement of the mouse. The user moves the mouse as necessary to get the displayed pointer into a desired location or position. Once the pointer on the screen points at an object or location of interest, one of one or more buttons on the mouse is activated with the fingers of the hand holding the mouse. The activation serves as an instruction to take some action, the nature of which is defined by the software in the computer.
Unfortunately, the usual sort of mouse described above is subject to a number of shortcomings. Among these are deterioration of the mouse ball or damage to its surface, deterioration or damage to the surface of the mouse pad, and degradation of the ease of rotation for the contact rollers (say, (a) owing to the accumulation of dirt or of lint, or (b) because of wear, or (c) both (a) and (b)). All of these things can contribute to erratic or total failure of the mouse to perform as needed. These episodes can be rather frustrating for the user, whose complaint might be that while the cursor on the screen moves in all other directions, he can""t get the cursor to, say, move downwards. Accordingly, industry has responded by making the mouse ball removable for easy replacement and for the cleaning of the recessed region into which it fits. Enhanced mouse ball hygiene was also a prime motivation in the introduction of mouse pads. Nevertheless, some users become extremely disgusted with their particular mouse of the moment when these remedies appear to be of no avail. Mouse and mouse pad replacement is a lively business.
The underlying reason for all this trouble is that the conventional mouse is largely mechanical in its construction and operation, and relies to a significant degree on a fairly delicate compromise about how mechanical forces are developed and transferred.
There have been several earlier attempts to use optical methods as replacements for mechanical ones. These have included the use of photo detectors to respond to mouse motion over specially marked mouse pads, and to respond to the motion of a specially striped mouse ball. U.S. Pat. No. 4,799,055 describes an optical mouse that does not require any specially pre-marked surface. (Its disclosed two orthogonal one pixel wide linear arrays of photo sensors in the X and Y directions and its state-machine motion detection mechanism make it a distant early cousin to the technique of the incorporated Patents, although it is our view that the shifted and correlated array [pixel pattern within an area] technique of the incorporated Patents is considerably more sophisticated and robust.) To date, and despite decades of user frustration with the mechanical mouse, none of these earlier optical techniques has been widely accepted as a satisfactory replacement for the conventional mechanical mouse. Thus, it would be desirable if there were a non-mechanical mouse that is viable from a manufacturing perspective, relatively inexpensive, reliable, and that appears to the user as essentially the operational equivalent of the conventional mouse. This need could be met by a new type of optical mouse has a familiar xe2x80x9cfeelxe2x80x9d and is free of unexpected behaviors. It would be even better if the operation of this new optical mouse did not rely upon cooperation with a mouse pad, whether special or otherwise, but. was instead able to navigate upon almost any arbitrary surface.
A solution to the problem of replacing a conventional mechanical mouse with an optical counterpart is to optically detect motion by directly imaging as an array of pixels the various particular spatial features of a work surface below the mouse, much as human vision is believed to do. In general, this work surface may be almost any flat surface; in particular, the work surface need not be a mouse pad, special or otherwise. To this end the work surface below the imaging mechanism is illuminated from the side, say, with an infrared (IR) light emitting diode (LED). A surprisingly wide variety of surfaces create a rich collection of highlights and shadows when illuminated with a suitable angle of incidence. That angle is generally low, say, on the order of five to twenty degrees, and we shall term it a xe2x80x9cgrazingxe2x80x9d angle of incidence. Paper, wood, formica and painted surfaces all work well; about the only surface that does not work is smooth glass (unless it is covered with fingerprints!).The reason these surfaces work is that they possess a micro texture, which in some cases may not be perceived by the unaided human senses.
IR light reflected from the micro textured surface is focused onto a suitable array (say, 16xc3x9716 or 24xc3x9724) of photo detectors. The LED may be continuously on with either a steady or variable amount of illumination servoed to maximize some aspect of performance (e.g., the dynamic range of the photo detectors in conjunction with the albedo of the work surface). Alternatively, a charge accumulation mechanism coupled to the photo detectors may be xe2x80x9cshutteredxe2x80x9d (by current shunting switches) and the LED pulsed on and off to control the exposure by servoing the average amount of light. Turning the LED off also saves power; an important consideration in battery operated environments. The responses of the individual photo detectors are digitized to a suitable resolution (say, six or eight bits) and stored as a frame into corresponding locations within an array of memory. Having thus given our mouse an xe2x80x9ceyexe2x80x9d, we are going to further equip it to xe2x80x9cseexe2x80x9d movement by performing comparisons with successive frames.
Preferably, the size of the image projected onto the photo detectors is a slight magnification of the original features being imaged, say, by two to four times. However, if the photo detectors are small enough it may be possible and desirable to dispense with magnification. The size of the photo detectors and their spacing is such that there is much more likely to be one or several adjacent photo detectors per image feature, rather than the other way around. Thus, the pixel size represented by the individual photo detectors corresponds to a spatial region on the work surface of a size that is generally smaller than the size of a typical spatial feature on that work surface, which might be a strand of fiber in a cloth covering a mouse pad, a fiber in a piece of paper or cardboard, a microscopic variation in a painted surface, or an element of an embossed micro texture on a plastic laminate. The overall size of the array of photo detectors is preferably large enough to receive the images of several features. In this way, images of such spatial features produce translated patterns of pixel information as the mouse moves. The number of photo detectors in the array and the frame rate at which their contents are digitized and captured cooperate to influence how fast the seeing-eye mouse can be moved over the work surface and still be tracked. Tracking is accomplished by comparing a newly captured sample frame with a previously captured reference frame to ascertain the direction and amount of movement. One way that may be done is to shift the entire content of one of the frames by a distance of one pixel (corresponds to a photo detector), successively in each of the eight directions allowed by a one pixel offset trial shift (one over, one over and one down, one down, one up, one up and one over, one over in the other direction, etc.). That adds up to eight trials, but we mustn""t forget that there might not have been any motion, so a ninth trial xe2x80x9cnull shiftxe2x80x9d is also required. After each trial shift those portions of the frames that overlap each other are subtracted on a pixel by pixel basis, and the resulting differences are (preferably squared and then) summed to form a measure of similarity (correlation) within that region of overlap. Larger trial shifts are possible, of course (e.g., two over and one down), but at some point the attendant complexity ruins the advantage, and it is preferable to simply have a sufficiently high frame rate with small trial shifts. The trial shift with the least difference (greatest correlation) can be taken as an indication of the motion between the two frames. That is, it provides a raw F/B and L/R. The raw movement information may be scaled and or accumulated to provide display pointer movement information (xcex94x and xcex94y) of a convenient granularity and at a suitable rate of information exchange.
The actual algorithms described in the incorporated Patents (and used by the seeing eye mouse) are refined and sophisticated versions of those described above. For example, let us say that the photo detectors were a 16xc3x9716 array. We could say that we initially take a reference frame by storing the digitized values of the photo detector outputs as they appear at some time t0. At some later time t1 we take a sample frame and store another set of digitized values. We wish to correlate a new collection of nine comparison frames (thought to be, null, one over, one over and one up, etc.) against a version of the reference frame representing xe2x80x9cwhere we were last timexe2x80x9d. The comparison frames are temporarily shifted versions of the sample frame; note that when shifted a comparison frame will no longer overlap the reference frame exactly. One edge, or two adjacent edges will be unmatched, as it were. Pixel locations along the unmatched edges will not contribute to the corresponding correlation (i.e., for that particular shift), but all the others will. And those others are a substantial number of pixels, which gives rise to a very good signal to noise ratio. For xe2x80x9cnearest neighborxe2x80x9d operation (i.e., limited to null, one over, one up/down, and the combinations thereof) the correlation produces nine xe2x80x9ccorrelation valuesxe2x80x9d, which may be derived from a summing of squared differences for all pixel locations having spatial correspondence (i.e., a pixel location in one frame that is indeed paired with a pixel location in the other framexe2x80x94unmatched edges won""t have such pairing).
A brief note is perhaps in order about how the shifting is done and the correlation values obtained. The shifting is accomplished by addressing offsets to memories that can output an entire row or column of an array at one time. Dedicated arithmetic circuitry is connected to the memory array that contains the reference frame being shifted and to the memory array that contains the sample frame. The formulation of the correlation value for a particular trial shift (member of the nearest or near neighbor collection) is accomplished very quickly. The best mechanical analogy is to imagine a transparent (reference) film of clear and dark patterns arranged as if it were a checker board, except that the arrangement is perhaps random. Now imagine that a second (sample) film having the same general pattern is overlaid upon the first, except that it is the negative image (dark and clear are interchanged). Now the pair is aligned and held up to the light. As the reference film is moved relative to the sample film the amount of light admitted through the combination will vary according to the degree that the images coincide. The positioning that admits the least light is the best correlation. If the negative image pattern of the reference film is a square or two displaced from the image of the sample film, the positioning admits the least light will be one that matches that displacement. We take note of which displacement admits the least light; for the seeing eye mouse we notice the positioning with the best correlation and say that the mouse moved that much. That, in effect, is what happens within an integrated circuit (IC) having photo detectors, memory and arithmetic circuits arranged to implement the image correlation and tracking technique we are describing.
It would be desirable if a given reference frame could be re-used with successive sample frames. At the same time, each new collection of nine (or twenty-five) correlation values (for collections at t1, tixe2x88x921, etc.) that originates from a new image at the photo detectors (a next sample frame) should contain a satisfactory correlation. For a hand held mouse, several successive collections of comparison frames can usually be obtained from the (16xc3x9716) reference frame taken at t0. What allows this to be done is maintaining direction and displacement data for the most recent motion (which is equivalent to knowing velocity and time interval since the previous measurement). This allows xe2x80x9cpredictionxe2x80x9d of how to (permanently!) shift the collection of pixels in the reference frame so that for the next sample frame a xe2x80x9cnearest neighborxe2x80x9d can be expected to correlate. This shifting to accommodate prediction throws away, or removes, some of the reference frame, reducing the size of the reference frame and degrading the statistical quality of the correlations. When an edge of the shifted and reduced reference frame begins to approach the center of what was the original reference frame it is time to take a new reference frame. This manner of operation is termed xe2x80x9cpredictionxe2x80x9d and could also be used with comparison frames that are 5xc3x975 and an extended xe2x80x9cnear neighborxe2x80x9d (null, two over/one up, one over/two up, one over/one up, two over, one over, . . . ) algorithm. The benefits of prediction are a speeding up of the tracking process by streamlining internal correlation procedure (avoiding the comparison of two arbitrarily related 16xc3x9716 arrays of data) and a reduction of the percentage of time devoted to acquiring reference frames.
In addition to the usual buttons that a mouse generally has, our seeing eye mouse may have another button that suspends the production of movement signals to the computer, allowing the mouse to be physically relocated on the work surface without disturbing the position on the screen of the pointer. This may be needed if the operator runs out of room to physically move the mouse further, but the screen pointer still needs to go further. This may happen, say, in a UNIX system employing a display system known as xe2x80x9cSingle Logical Screenxe2x80x9d (SLS) where perhaps as many as four monitors are arranged to each display some subportion of the overall xe2x80x9cscreenxe2x80x9d. If these monitors were arranged as one high by four across, then the left to right distance needed for a single corresponding maximal mouse movement would be much wider than usually allowed for. The usual maneuver executed by the operator for, say, an extended rightward excursion, is to simply pick the mouse up at the right side of the work surface (a mouse pad, or perhaps simply the edge of clearing on an otherwise cluttered surface of his desk), set it down on the left and continue to move it to the right. What is needed is a way to keep the motion indicating signals from undergoing spurious behavior during this maneuver, so that the pointer on the screen behaves in an expected and non-obnoxious manner. The function of the xe2x80x9choldxe2x80x9d button may be performed automatically by a proximity sensor on the underside of the mouse that determines that the mouse is not in contact with the work surface, or by noticing that all or a majority of the pixels in the image have xe2x80x9cgone darkxe2x80x9d (it""s actually somewhat more complicated than thatxe2x80x94we shall say more about this idea in the next paragraph). Without a hold feature, there may be some slight skewing of the image during the removal and replacement of the mouse, owing either: (a) to a tilting of the field of view as the mouse is lifted; or (b) to some perverse mistake where frames for two disparate and widely separated spatial features imaged at very different times during the removal and replacement are nevertheless taken as representing a small distance between two frames for the same feature. A convenient place for an actual hold button is along the sides of the mouse near the bottom, where the thumb and the opposing ring finger would grip the mouse to lift it up. A natural increase in the gripping force used to lift the mouse would also engage the hold function. A hold feature may incorporate an optional brief delay upon either the release of the hold button, detection of proper proximity or the return of reasonable digitized values. During that delay any illumination control servo loops or internal automatic gain controls would have time to stabilize and a new reference frame would be taken prior to the resumption of motion detection.
And now for this business of the pixels in the image xe2x80x9cgoing darkxe2x80x9d. What happens, of course, is that the IR light from the illuminating LED no longer reaches the photo detectors in the same quantity that it did, if at all; the reflecting surface is too far away or is simply not in view. However, if the seeing eye mouse were. turned over, or its underside exposed to an intensely lit environment as a result of its being lifted, then the outputs of the photo detectors might be at any level. The key is that they will be uniform, or nearly so. The main reason that they become uniform is that there is no longer a focused image; all the image features are indistinct and they are each spread out over the entire collection of photo detectors. So the photo detectors uniformly come to some average level. This is in distinct contrast with the case when there is a focused image. In the focused case the correlations between frames (recall the one over, one over and one down, etc.) exhibit a distinct phenomenon.
Assume that the spatial features being tracked mapped exactly onto the photo detectors, through the lens system, and that mouse movement were jerky by exactly the amount and in the directions needed for a feature to go from detector to detector. Now for simplicity assume also that there is only one feature, and that its image is the size of a photo detector. So, all the photo detectors but one are all at pretty much the same level, and the one detector that is not at that level is at a substantially different level, owing to the feature. Under these highly idealized conditions it is clear that the correlations will be very well behaved; eight xe2x80x9clargexe2x80x9d differences and one small difference (a sink hole in an otherwise fairly flat surface) in a system using nine trials for a nearest neighbor algorithm (and remembering that there may have been no motion). [Note: The astute reader will notice that the xe2x80x9clargexe2x80x9d difference in this rather contrived example actually corresponds to, or originates with, only one pixel, and probably does not deserve to be called xe2x80x9clargexe2x80x9dxe2x80x94recall the earlier shifted film analogy. The only light passed by the films for this example would be for the one pixel of the feature. A more normal image having a considerably more diverse collection of pixels increases the difference to where it truly is a xe2x80x9clargexe2x80x9d difference.]
Now, such highly idealized conditions are not the usual case. It is more normal for the image of the tracked spatial features to be both larger and smaller than the size of the photo detectors, and for the mouse motion to be continuous, following a path that allows those images to fall onto more than one detector at once. Some of the detectors will receive only a partial image, which is to say, some detectors will perform an analog addition of both light and dark. The result is at least a xe2x80x9cbroadeningxe2x80x9d of the sink hole (in terms of the number of photo detectors associated with it) and very possibly a corresponding decrease in the depth of the sink hole. The situation may be suggested by imagining a heavy ball rolling along a taut but very stretchable membrane. The membrane has a discrete integer Cartesian coordinate system associated with it. How much does the membrane distend at any integer coordinate location as the ball rolls? First imagine that the ball is of a very small diameter but very heavy, and then imagine that the ball is of a large diameter, but still weighs the same. The analogy may not be exact, but it serves to illustrate the idea of the xe2x80x9csink holexe2x80x9d mentioned above. The general case is that the generally flat surface with sharply defined sink hole becomes a broad concavity, or bowl.
We shall term the surface produced or described by the various correlation values the xe2x80x9ccorrelation surfacexe2x80x9d and will, at various times, be most interested in the shape of that surface.
We say all of this to make two points. First, the shifting shape of the concavity in the correlation surface as the seeing eye mouse moves allows interpolation to a granularity finer than the simple size/spacing of the photo detectors. We point this out, with the remark that our seeing eye mouse can do that, and leave it at that. The full details of interpolation are described in the incorporated Patents. No further discussion of interpolation is believed necessary. Second, and this is our real reason for the discussion of the preceding paragraphs, is the observation that what happens when the seeing eye mouse is picked up is that the concavity in the correlation surface goes away, to be replaced by generally equal values for the correlations (i.e., a xe2x80x9cflatxe2x80x9d correlation surface). It is when this happens that we may say with considerable assurance that the seeing eye mouse is air borne, and can then automatically invoke the hold feature, until after such time that a suitable concavity (xe2x80x9cbowlxe2x80x9d) reappears.
Another method for invoking or initiating a hold feature is to simply notice that the seeing eye mouse is moving faster than a certain threshold velocity (and is thus presumably experiencing an abrupt retrace motion in a maneuver intended to translate the screen pointer further than the available physical space within which the mouse is operating). Once the velocity threshold is exceeded the motion indicating signals that would otherwise be associated with that movement are suppressed until such time as the velocity drops below a suitable level.