The present invention relates to the use of an electronic camera in a platenless document imaging system in which the document image is a composite image formed from a mosaic of overlapping images captured by the camera.
In recent years, document scanners have become commonplace. Although these work well and are relatively inexpensive, a flatbed or platen-based document scanner occupies a significant amount of scarce desk space.
The use of a camera to take a photograph of a document consisting of text and/or images offers one way of dealing with the problem of wasted desk space. An electronic camera would need to have a detector with about 40 megapixel resolution in order to have a resolution over an A4 sized document comparable with that of the resolution of a typical document scanner, typically about 24 dots/mm (600 dpi). Such high-resolution detectors cost much more than the total cost of a desktop scanner.
As a result, it has been proposed to use an electronic camera with an actuator to scan the field of view of the camera over a document, and so form a composite image of the document from a number of overlapping image tiles. This permits less expensive lower resolution detector arrays to be used to build up an image of a document with a resolution comparable with that of a conventional document scanner. See, for example, patent document U.S. Pat. No. 5,515,181.
A problem with this approach is the fact that the image tiles must have some overlap, because it is impractical to use an actuator which moves the camera so precisely that tiles will fit together with no overlap. The conventional approach to fitting together overlapping tiles involves identifying features in the image of one tile in an overlap region and matching this against a corresponding feature in an adjacent tile""s overlap region.
This feature matching approach suffers from various difficulties. First, computational algorithms to identify and match features are relatively slow compared with the process of gathering the images, which limits the throughput of a scanning camera document imaging system. Second, many documents have significant areas of blank space, for which it is not possible to match features. This necessitates the use of larger overlap areas to increase the likelihood that there will be suitable matching features in the overlap areas, with the result that more images must be captured. Third, it is possible that features will be incorrectly matched, particularly for text based documents in which common letters repeat frequently.
Another problem is that an image from an inexpensive camera will have some image distortion, particularly towards the edges of the field of view. The distortion is therefore strongest in the overlap region between tiles, which makes it more difficult to achieve a good overlap simply by matching features. As a result, it may be necessary to match several features over the extent of the overlap area to get a good fit between adjacent tiles.
As a result of problems such as these, scanning camera-based document imaging systems cannot yet compete with flatbed or platen-based document scanning systems.
The present invention addresses these and other problems.
In one embodiment disclosed herein, a method of calibrating an image capture system and subsequently capturing an image of a document comprises providing an electronic camera for imaging a portion of a document, an actuator for moving the camera over a document support surface and for cooperating with the camera to capture a plurality of overlapping image tiles of the document at different locations over the support surface and with a predetermined degree of overlap, and electronic processing means for joining the plurality of image tiles into a composite image of the document by generating from tile data points associated with each tile a corrected array of tile data points that correct for the expected distortion and predetermined overlap of neighboring image tiles in accordance with transform data for each image tile related to the predetermined overlap and to expected distortion for each image tile; providing a two-dimensional registration array within the field of view of the camera across an area corresponding with the document to be imaged, the registration array having a plurality of individually identifiable location identification features with a predetermined orientation and spacing amongst the features; using the camera to capture a plurality of overlapping image tiles of the registration array at predetermined locations and predetermined overlap, the locations and overlap corresponding to those to be used with the document to be imaged, each image tile having an array of tile data points that cover a plurality of location features; identifying for each image tile a plurality of individual location identification features, associating with the features particular tile data points and from the predetermined orientation and spacing of the specific features determining from the tile data points if there is any image distortion in that image tile; and generating the transform data for each image tile from the identity of the location identification features and the determined distortion; and wherein capturing the image subsequently comprises placing a document on the document support surface; positioning the camera above the document support surface; capturing a plurality of image tiles of the document at a plurality of different predetermined locations over the support surface with a predetermined degree of overlap between neighboring image tiles, each image tile having an array of tile data points and each predetermined location having a predetermined camera field of view; and causing the electronic processing means to generate from the transform data and the tile data points associated with each tile a corrected array of tile data points that correct for the expected distortion and predetermined overlap of neighboring image tiles, and to join the plurality of image tiles into a composite image of the document.
Using the camera to capture the plurality of overlapping image tiles may involve using the actuator to move the camera between image tiles in the same order as for the document to be imaged. The camera may further include a focus mechanism, the transform data may include separate data for different focus settings, and the method may further comprise focusing the camera on the document and selecting the transform data according to the focus setting.
Accordingly, the invention provides an image capture system, comprising: an electronic camera with an electronic detector and a lens with a field of view for imaging on the detector a portion of a document; an actuator for moving the camera field of view over a document support surface, the camera and the actuator co-operating so that a plurality of overlapping image tiles of a document can be captured at different locations over the support surface, each image tile having an array of tile data points and being subject to some expected perspective and/or camera distortion relative to the support surface; and electronic processing means by which the plurality of image tiles may be joined into a composite image of the document; characterised in that:
i) for each image tile the camera field of view relative to the support surface and the degree of overlap between neighbouring image tiles are predetermined;
ii) the electronic processing means includes a memory which stores transform data for each image tile, the transform data relating both to the expected distortion and to the predetermined overlap between image tiles; and
iii) the processing means is adapted to use the transform data to generate from the tile data points a corrected array of tile data points with said distortion corrected and with the corrected image tiles correctly overlapped with respect to neighbouring corrected image tiles to form a composite image of the document.
The image capture system may include a support by which the camera can be positioned to view a document support surface on which the document may be placed in view of the camera.
Because the camera field of view relative to the document support surface and the overlap are predetermined, the relative orientation and distortion of each image tile with respect to its neighbours will be repeatably the same, to within some residual positioning error for the camera actuator. Therefore, if the system is used to image a document more than one time, without moving the document with respect to the document support surface, each of the image tiles will be substantially the same with corresponding image tiles from one time to the next.
Therefore, as long as the positioning and movement of the camera is repeatable, transform data needs only to be generated and stored once. The transform data then relates each image data point of each image tile to a corrected image data point. The corrected image data point of one tile at a point in an overlap area will then correspond closely to a corresponding corrected image data point a similarly overlapping area of an adjacent or neighbouring tile.
The invention also provides a method capturing an image of a document using an image capture system according to the invention, in which the method comprises the steps of:
a) positioning the camera above the support surface and placing a document on the support surface within view of the camera;
b) capturing a plurality of overlapping image tiles of the document at different locations over the support surface, each image tile having an array of tile data points; characterised in that in step b) the different locations are predetermined so that for each image tile the camera field of view relative to the support surface and the degree of overlap between neighbouring image tiles are predetermined, and in that the method comprises the steps of:
c) using the electronic processing means to generate from the transform data and the tile data points a corrected array of tile data points in order to correct said distortion and correctly overlap each corrected image tile with respect to neighbouring corrected image tiles; and
d) joining neighbouring corrected image tiles to form a composite image of the document.
The system may include a mount by which the camera may be mounted over the document support surface, which may be a desk or other such work surface. The mount may position the camera either directly above, or above and to one side of the work surface. If the camera is mounted to one side of the work surface, then the actuator is most conveniently a two-axis tilt and pan actuator.
Most commonly, the document support surface will be a work surface, such as a desktop.
The accuracy of many types of actuator is limited by mechanical play or backlash in the actuator driving mechanism. Such imperfections can be minimised if the actuator always follows the same pattern of movement as the camera field of view is moved from a start position over the document support surface, and then back to the original start position. In this way, the relative orientation between image tiles and the degree of overlap between neighbouring tie can be made most accurate. The absolute orientation of the set of the image tile with respect to the document support surface is then a secondary consideration, as long as perspective distortion does not change significantly from one pass of the camera over a document to the next pass.
In a preferred embodiment of the invention, prior to storing of the transform data in the memory, the method comprises the steps of:
e) providing a two-dimensional registration array within the field of view of the camera across an area corresponding with the document to be imaged, the registration array having a plurality of individually identifiable location identification features with a predetermined orientation and spacing amongst the features;
f) using the camera to capture a plurality of overlapping image tiles of the registration array at predetermined locations and predetermined overlap, said locations and overlap corresponding to those to be used with the document to be imaged, each image tile having an array of tile data points that cover a plurality of location features;
g) identifying for each image tile a plurality of individual location identification features, associating with said features particular tile data points and from the predetermined orientation and spacing of the specific features determining from the tile data points if there is any image distortion in that image tile;
h) generating from the identity of the location identification features and the determined distortion the transform data for each image tile.
The transform data can therefore be derived empirically in an initial calibration of the image capture system. Because the calibration data is generated directly from the same equipment that will be used to image the document, the calibration will be naturally close to the actual performance of the image capture system. It is therefore unnecessary to generate the transform data using a mathematical model of the camera and camera scanning system. In use of the image capture system, the use of the transform data to correct an image tile to achieve a correct overlap between neighbouring tiles involves relatively little computational effort compared with aligning image tiles solely by matching identifiable features in the imaged document.
It is particularly advantageous if, in the generation of the transform data, when the camera is used to capture a plurality of overlapping image tiles of the registration array, the actuator moves the camera between tiles in the same order as for the document to be scanned. Therefore, any repeatable imperfections in the movement of the camera between image tiles will automatically be accounted for in the transform data.
Preferably, there are at least four location identification features that are identified for each image tile.
Preferably each image tile captured of the registration array has at least one unique location identification feature. Because the spacing and orientation between location identification features is known, this then allows the separation and orientation between any two image tiles of the registration array to be determined solely from the image tile data points of the registration array.
One way of providing a registration array is if the location identification features are printed on a card. The card may be used just in a manufacturing environment. However, if the mount has a location feature for correctly orienting the card and document with respect to the camera field of view, then the card may be used either by a user of the image capture system, or by a service engineer should the system need to be recalibrated. The location feature may be a right angle bracket for aligning a right angle corner of a document to be scanned.
The electronic camera will have some depth of focus defined by the lens, and aperture setting if any. The portion of the document being imaged will need to lie within this depth of focus in order to achieve optimum resolution of the document. If the document is thin, then it will effectively lie in the plane of the document support surface. However, if the document is thick, then the calibration of the transform data may not be valid, for example because of different perspective distortion. One way to overcome this is if the actuator is arranged to rotate the camera about the optical centre of the lens as the camera field of view is moved over the support surface. Then, it is only necessary to have one set of transform data, as this will apply to different focus displacement away from that for the document support surface.
However, such a lens introduces additional cost. Therefore, the actuator may not be arranged to rotate the camera about the optical centre of the lens as the camera field of view is moved over the support surface. In this case, the camera includes a focus mechanism, and the transform data includes separate data for different focus settings. The method of imaging a document then comprises the additional steps of: focussing the camera on the document; and selecting the transform data according to the focus setting. The focus setting may be determined from the lens position, an optical focus sensor or by other means, for example an ultrasonic focus detector.