This invention relates generally to the processing of images such as images resulting from reading documents by an optical scanner and more particularly to a method for separating a fixed part of the image i.e. the template from filled in information.
The deployment of computers in all aspects of everyday life, along with the dramatic increase of their direct and associated storage capacity combined with their ability to be interconnected, thus forming networks on which information easily circulates, is considerably influencing the way administrative tasks are carried out. Instead of handling tons of paper forms and documents, information is digitized and stored in the form of files in nonvolatile computer memories and, often, in huge dedicated storage units. Thereafter, the information is quickly retrieved when needed and is available in any place, having access to the network and the proper equipment to display it or print it, regardless of the location where information is actually stored. A typical example is a nationwide insurance company electronically storing all customer contracts such that every branch and agent can have access to any of them as needed.
Gathering information from individuals wherever they are at a particular moment, for example, the customer of an insurance company or the patient in a hospital, is generally considered to be a cumbersome and error prone type of work. If entered directly from a computer keyboard, the input of requested information is time consuming, generally requires a third party, such as an agent in a branch office or generally speaking an attendant, acting as a typist while the individual provides sometimes private and essential pieces of information without immediate feedback. Moreover, this way of interfacing computers does not permit convenient entry of anything but textlike information and does not permit entry of a signature that could somehow authenticate the input record.
Thus, a preferred mode of interfacing, made possible since the storage and processing capabilities of computers have dramatically improved, consists in having a form filled in directly by the individual providing the information who is assumed to be able to self check its content. The form or the document is then fed into a reading device, such as an optical scanner, which transforms the acquired information into machine readable code permanently stored for later and possibly repeated processing so as to extract and only retain the informative part of the form excluding the template. The template comprises all the fixed forms and text whose only purpose was to guide and instruct the individual filling in the form with the required information. The filled in information or variable part, which generally differs from one document to the other, has only to be associated with a particular form, stored only once, so as to be able to reconstruct the complete filled in form if that ever becomes necessary.
The above described mode of interfacing computers is also frequently used whenever archives are digitized so as to be stored in electronic storage means rather than being kept in conventional storage units, thus saving a considerable amount of space and granting to the archives all the advantages of an electronic document that becomes available simultaneously at many places and is easily retrievable. Whenever archives are filled in forms such as the documents resulting from a census, all of what is discussed here applies.
Because the scanning of forms and documents must be performed with a sufficiently high resolution so that no entered information is lost or is significantly altered, becoming unrecognizable, the reading devices usually produce a high volume of scan data. Although the size of computer memories increase each year, the amount of stored information has to be limited to a reasonable size to permit the permanent storing of documents, the total size of which may have to be expressed in units of millions of pages. This limitation becomes obvious for economic reasons and the practical limitations on the maximum size of the storing devices. Another important reason for which the amount of stored information must be drastically limited deals with the fact that, as mentioned earlier, the documents are generally made available over a network and must be transferred, upon request, to the end user, sometimes through communication links or virtual connections that would have an insufficient bandwidth to permit the transfer of excessively large documents to complete within an acceptable response time. As a typical example, a page of A4 size (297xc3x97210 mm), scanned at 100 pixels/cm, requires about 700 Kbytes of storage space. If it is transferred over a standard 64 kbits/sec communication channel this would take about 90 seconds. A time that is about two orders of magnitude higher than what is tolerable. Thus, algorithms and methods, known of the art, to compress data are generally applied which reduce the amount of raw data coming out from the scanner typically by at least one order of magnitude. The A4 document is thereby reduced to below 70 Kbytes. However, even before applying compression techniques, a very significant step towards the reduction of stored data is accomplished by removing the fixed part of the form and retaining only the unique entered information. The entered variable information accounts, typically, for only 10% of the scanned data, thus providing another order of magnitude reduction.
On top of being a significant contributor to the reduction of the amount of data to be stored, and to be transferred to the user, the removal of the template has a second very important objective. It is key to permitting subsequent flawless running of optical character recognition (OCR) software, aimed at interpreting the variable part of the form, so that the variable entered information contents can effectively be processed according to the purpose for which the forms have been designed.
Preventing the fixed part of a form from being stored can be achieved during the scanning process itself. One method for the elimination of the fixed template has been reported by D. E. Nielsen and al., in xe2x80x9cEvaluation of Scanner Spectral Response for Insurance Industry Documentsxe2x80x9d, 16/A44 NCI Program, Working Paper No. 2. May 1973. This method, also known as xe2x80x9cdropout inkxe2x80x9d technique, is based on the idea to use a special color ink when printing a form that is transparent to conventional scanners. If a completed form of this type is scanned, the fixed pattern is invisible to the scanner, and only the variable part is captured. On top of being more expensive an obvious disadvantage of using special ink sensitive scanners is that it prevents the application of this approach to existing archives.
Thus, another approach for the separation of the form template background from the filled-in information has been disclosed in U.S. Pat. No.5,182,656 entitled xe2x80x9cMethod for Compressing and Decompressing Forms by Means of very large Symbol Matchingxe2x80x9d. According to this approach, empty forms, the fixed parts, are prescanned and the data obtained is digitized and stored in a computer memory to create a library of forms. The original filled in form is then scanned, the data obtained is digitized and the retrieved representation of the empty form is eventually subtracted, the difference being the digital representation of the variable part i.e. the filled in information. In order to perform such form elimination, it is necessary to precisely align the input form image with an image of the empty template. Even when the input form image is globally aligned with the template and there are no offset, skew or scale differences, there usually are local distortions that must be straightened out before the template can be dropped out. Such local distortions are frequently the result of inconsistent scanner performances or distortions from photocopying.
A method to compute fine registration in order to align the fixed part with respect to the variable part of an image is described in European patent application EP-A-0 411 231 (U.S. Pat. No. 5,182,656), which has already been mentioned above and in EP-A-0 411 232 (U.S. Pat. No. 5,204,756) entitled xe2x80x9cMethod for High-Quality Compression of Binary Text Imagesxe2x80x9d. Assuming that the local distortions are small, and piece wise linear, both the input and the template image are broken into small blocks and histogram correlation is used to find the relative offsets of corresponding blocks.
Although the above described technique usually works quite well, improved reliability is often required. As an example, this is the case when distortions, accumulated through several iterations of copying a form, goes beyond the recovery capability of the technique. Thus, an improved method for template elimination has been described in U.S. Pat. No. 5,793,887 with the title xe2x80x9cMethod and Apparatus for Alignment of Images for Template Eliminationxe2x80x9d. The chief object of this invention is to improve the handling of images with nonlinear distortions in order to be able to make known template elimination techniques more effective in overcoming the problem of the local distortions and to be able to achieve a fine alignment of the input image over the empty prestored reference template image. This is achieved through the use of a more robust optimal correspondence subsequence (OCS) algorithm which comprises the steps of correlating lines of a reference template image to lines in a variable template image by finding corresponding pairs of projections in one direction of the lines in the reference template image and the variable template image, determining the displacement of the two projections of lines of each pair in a direction perpendicular to said direction of projection and evaluating the number of rows or columns the picture elements of each line of the variable template image has to be shifted to achieve a match between the pairs of projections, and generating a new input image by shifting the picture elements of lines of the variable template image perpendicular to the direction of projection as determined in the last step.
Thus, the technique for removing templates from the image of filled in forms works well after the major improvements, briefly described herein above. These additions to the initial method take care of the linear and nonlinear local distortion, permitting complete removal of the template thus, avoiding the use of forms that must be printed with special ink invisible to scanners.
However, a very disturbing problem is still unresolved after the template has been removed, irrespective of the precision and quality of the template removal process. Human beings tend to be careless when filling in forms. Very often the hand written information covers part of the template. Similarly, when forms were filled out using typing machines, which is often the case of archives, typing was frequently shifted with respect to the template due to an imprecise positioning of the form into the typing machine. In this case, complete lines of typed information are stricken over by lines of the template. Then, removing the template lines creates gaps in the characters of the filled in information. The black dots, or picture elements that are common to the filled in information and to the template, are all deleted. Thus, if the first objective of significantly reducing the amount of data to be stored and transmitted by removing the template is done well, the second goal of permitting flawless running of OCR software is not fulfilled. Indeed, if the template has been removed from overlying characters, gaps have been created in the characters that may make them unreadable or even worse misinterpreted. An example of this situation being a 2, with its horizontal bottom line covering the template that wrongly becomes a 7. This may have a stunning effect if this is the most significant digit of a monetary transaction.
Although, scanners able to analyze images in gray levels have been available for years their use is only now spreading out as their cost has been reduced enough to become competitive with simple black and white scanners. Therefore, it is an object of this invention to take advantage of the capability of those machines to have each image pixel represented by a gradation of gray levels from white to black, usually in 256 steps, so as to overcome the problem of the gaps created when the template is removed.
It is a further object of the present invention to retain the advantage of the previous methods where the storing of the variable part is eventually a binary image, made of black and white dots, requiring a minimum amount of memory and thus being transferable to an end user within an acceptable response time in spite of limitations of bandwidth on certain wide area networks.
In a system for processing images of filled in forms, a method is disclosed for dropping the fixed part or template of a form without altering the variable part of filled-in information. The method comprises the steps of:
scanning the filled in form for generating an image of the form consisting of picture elements, each carrying a level of gray depending on the brightness of the pixel;
storing the image of the filled in form in gray levels;
retrieving an image of the fixed part previously stored;
registering the image of the filled in form over the image of the fixed part in order to distinguish between variable part pixels and fixed part pixels from their position;
collecting the pixels positioned within the fixed part, excluding those pixels near the variable part and morphologically thinning the fixed part by excluding those of the pixels positioned at the periphery in order to establish a first statistic on the levels of gray of the fixed part;
collecting the remaining pixels, all belonging to the variable part and establishing a second statistic on the levels of gray of the variable part pixels;
comparing the first and second statistics;
moving those fixed part pixels which are statistically significantly different, to the variable part of the image of the document.
In a preferred embodiment of the invention the comparison of the first and second statistics established over the fixed part and variable part populations of pixels permits establishing a first gray level threshold statistically separating the variable part pixels from the fixed part pixels, in preparation to move all pixels belonging to the fixed part population, darker than the threshold, to the variable part population.
The preferred embodiment of the method further includes the steps of:
preprocessing the gray level image, selecting groups of neighbor pixels and weighting their gray level values to a common value;
collecting the pixels positioned near the variable part, within the fixed part and establishing a third statistic on their levels of gray;
collecting all the pixels within the updated population of the variable part for, establishing a fourth statistic on the levels of gray of the updated population;
finding a second gray level threshold, computed from the above described third and fourth statistics, which statistically separates the variable part pixels from the fixed part pixels;
moving all pixels belonging to the fixed part population, darker than the second threshold and adjacent to variable part pixels, to the variable part population.
The method of the invention permits differentiating the variable part of a form from its template, even when they overlap, in order to overcome the problem of the gaps created when the template is removed. This objective is achieved, at without the extra costs of scanner invisible ink. Where ever the fixed part is printed with a different level of gray or a different color, translating into a different level of gray after the scanning process, as compared to the color or level of gray of the variable part, yields populations of pixels that are statistically different, whereby the gaps that would otherwise result from a brute force removing of template, can be filled by the method and system of the invention.
These and other advantages of the invention will become apparent to those skilled in the art of image processing from the following specification with reference to the drawings.