Recent developments in “over-the-desk” scanning take advantage of combining the functionality of traditional paper scanning devices with that of a digital camera. Over-the-desk scanning generally refers to capturing images of hardcopy documents positioned on a desktop with a camera positioned above the desktop. These captured images are digitized for further processing and then displayed on a computer monitor. An example of such an over-the-desk scanning system is disclosed by Wellner in U.S. Pat. No. 5,511,148 entitled “Interactive Copying System.”
Over-the-desk scanning has many advantages over traditional scanning methods using devices such as flat-bed scanners, sheet-feed scanners and hand-held scanners that use contact scanning to reproduce high resolution images of documents. In general, contact scanning is limited to the scanning of flat objects, such as documents, and is often considered cumbersome to use because the document must be moved from its place of reading or the scanner must be moved relative to the document for scanning.
One advantage of over-the-desk scanning versus traditional contact scanning is that of convenience because documents are not required to be moved from their usual place of reading. This encourages a more casual type of scanning where the user is able to scan small amounts information from a document as it is encountered while reading, rather than making a note of its position in a document for scanning at a later time.
A second advantage is that the non-contact nature of the over-the-desk scanning allows the capture of three-dimensional (3D) objects in addition to capturing two-dimensional (2D) objects. Thus, human gestures, as well as physical media, may be captured by the over the desk scanning. For example, a pointing finger may be used to annotate a hardcopy document captured by the camera.
Although the use of video cameras to scan objects provides many advantages over traditional scanning methods, the use of cameras for document scanning is often limited by the resolution of the camera. Low resolution cameras typically do not yield images with sufficient quality to enable successful document decoding using optical character recognition (OCR). For example, an OCR error rate under 1% may be achieved for 10-point Times Roman text, scanned with a video camera by applying carefully created binarisation algorithms to camera images acquired at such low resolutions as 100 dots per inch (dpi). Below this resolution, the error rate and the time to recognize a page increases rapidly. Furthermore, high resolution cameras are often not cost effective for an over-the-desk scanning system.
Various approaches have been used to improve low-resolution camera images. One technique, referred to as “super-resolution”, combines information from several low resolution images to create a higher resolution image of a source document. Each low-resolution image is shifted a small amount (i.e., of the order of a pixel). Such small scale shifting requires a precise small-scale translation device, or alternatively, a method to infer random movements using only the images themselves, with sub-pixel precision. In addition to requiring a large number of images, super-resolution is considered computationally expensive and difficult to implement. Furthermore, this technique does not fully overcome the problem of camera blur.
A second approach often referred to as “mosaicing”, “tiling” or “stitching” patches together several smaller low-resolution images to create a larger image having a higher resolution image. In general, mosaicing techniques are easier to implement than super-resolution techniques and also yields an increased resolution that is roughly proportional to the square root of the number of images in the mosaic.
When mosaicing, the smaller low-resolution images may be obtained in a number of ways. For example, the camera may be moved relative to the large imaging area. The camera may be moved by the user or automatically moved by a translation device. Unfortunately, if the camera is panned and/or tilted, perspective distortions often need to be corrected.
Alternatively, mosaicing may be performed by moving the object to be imaged (e.g., document) with respect to the camera. This type of mosaicing is only feasible when the object can be easily moved. When used for scanning documents, this method requires non-intuitive and inconvenient interaction with the user, who must move his document so that all parts of it may be seen by the camera.
However, these two types of mosaicing often result in transforming images by scaling, rotation or non-linear warping relative to each other. Subsequently, detection or calibration of the transformations and restoring the images to their undistorted coordinates are required to be performed before mosaics can be obtained. Not only are these operations computationally intensive but also may degrade the quality of the images.
A third type of mosaicing can be achieved by moving the image sensor of the camera in a plane parallel to the image plane. This generally involves extensive modification or retro-fitting of an existing consumer-level camera or a customized camera in order to mount the image sensor on a two-axis translation device. The inability to use commercially available consumer-level video cameras is likely to increase the cost of an over-the-desk scanning system.
Thus, under certain circumstances, it would be desirable to increase the resolution of the camera images recorded by consumer-level video cameras using a mosaicing technique with only minimal modifications to an existing consumer-level video camera. Such an approach is likely to enhance the quality of over-the-desk scanning images while maintaining the cost feasibilty of an over-the-desk scanning system.