The present invention relates to a document scanner for scanning, locating and deciphering data in data fields on one or more documents imprinted with data, a computer system therefor and a method therefor.
Although document scanners are relatively widely available (those scanners obtaining an image of a document) and there are computer software programs which read the information from the scanned image of the document and decipher and extract information from the document (such as OCR or optical character reader programs), there is a need for a stand alone, computer based unit which scans a document imprinted with data, locates the data on the document, deciphers the data and outputs the data in a common format to other computer systems.
U.S. Pat. No. 5,499,108 to Cotte et al. discloses a Document-Driven Scanning Input Device which communicates with a computer via an RS232 interface. Col. 4, line 54. In FIG. 12, and beginning at col. 10, line 29, Cotte ""108 discloses the use of an input device 214 which sends information codes via an RS232 cable to host computer 210. In one embodiment, the host computer 210 recognizes the codes generated by input device 214 and processes the information in accordance with the code. This processing may include faxing the scanned image, sending the image in an e-mail message or performing word processing on the image. Col. 10, line 47, line 48, and line 54. In an alternate embodiment (col. 10, line 54), input device 214 analyzes the data stream from the scanned document to locate the code on the document. In one embodiment, the code on the document is a special symbol which input device 214 recognizes. Col. 11, lines 24-30. A graphic symbol may be used. Col. 11, lines 34-35. The input device 214 may be trained with symbol recognition software to recognize a particular symbol. Col. 11, lines 44-46. Predefined or predetermined graphic signals can be printed on the document to be scanned. Col. 11, lines 53-55. FIGS. 26-28 show graphic symbols identified in a xe2x80x9chot zonexe2x80x9d or data field. These graphic symbols are recognized by input device 214. Particularly, reference point 406 provides an index point for the symbol recognition and the recognition software captures data commands in hot zone 404 (FIG. 26) or hot zone 404 in (FIG. 27) or hot zone 410 in FIG. 28. See col. 12, lines 5-20.
Cotte ""108 also discloses the use of a preprinted document form for each type of operation. Col. 13, lines 26-28. Hand drawn symbols are also recognized by input device 214. Col. 13, lines 38-42. xe2x80x9cFIG. 16 represents the classes of embodiments where the command symbols 243 and 245 are placed either in hot zone which are not on the top of the page or randomly placed on the page. Recognition of symbols where random placement on the page is practiced is more easily performed when formalized symbols or stickers are used which the recognition software has already been trained to recognize.xe2x80x9d Col. 13, lines 54-61. After the input device 214 recognizes the symbol, the input device then compares the captured command and compares it to a list of previously stored symbols in order to decode the command. Col. 14, lines 7-12. Hence, Cotte ""108 decodes the scanned data field. xe2x80x9cThe input device software then generates the appropriate commands to send to the host software package invoked by the command symbol to cause the host software so invoked to appropriately process the data received from the input device.xe2x80x9d Col. 14, lines 30-34.
Cotte ""108 describes FIG. 13A as obtaining a scanned image data which is compressed by the input device. Col. 14, line 52. FIG. 13B is described at col. 15, lines 8-55 as a routine wherein the input device scans the document, generates a command to the host computer and the user on the host computer then selects a menu option for that scanned image. Cotte ""108 seems to output the scanned item as an image file rather than as a data file.
U.S. Pat. No. 5,243,149 to Comerford et al. discloses a Method and Apparatus for Improving the Paper Interface to Computing Systems. The two major components utilized in Comerford ""149 include a digitizing tablet 10 and a hand held scanner 15. See FIGS. 1-2; Col. 5, lines 9-13. The document is scanned and stored in an ordinary manner. Col. 7, lines 7-12. A control document is also prepared such that the control document links the scanned document file with other files such as an annotation file. The scanned image files and the files containing electronic representation of handwritten notes (an annotation file) are processed at a work station after being downloaded from the notepad to the work station. The analysis is conducted with a character recognition software and with handwriting recognition software. Col. 9, lines 7-20.
U.S. Pat. No. 5,392,447 to Schlack et al. discloses an Image-Based Electronic Pocket Organizer with Integral Scanning. In this computerized pocket organizer or personal data assistant, the operator interacts with the computer system through a touch sensitive electronic display panel 14 based upon various overlay screens or windows. The operator interacts with the windows by touching the screen to perform various functions such as data entry, hand printed text entry, virtual alpha numeric keyboard operations, organizer navigational operations, among others. Col. 3, lines 60-69. After a document is scanned, the electronic file is processed by an optical character recognition system or a photo or image processing system. Col. 6, lines 48-57. To process business cards, the business card is scanned by the scanning unit and character recognition software is applied to this electronic scanned image data to identify the text information therein. Col. 8, lines 37-44. A relational database is utilized to link various information and certain data fields. However, with respect to the input of business card information, the operator is prompted at the initiation of the scanned operation to identify and attach a file tag containing one or more of the linking fields to the image being scanned. In this manner, each scanned image can be easily identified and cross-referenced with tagged data files. Col. 9, lines 51-62. With respect to medical insurance information which is scanned directly from the patient""s medical card, the system performs a text identification routine (OCR) on the image bit map of the scanned medical card to identify areas of the bit map that contain text information. xe2x80x9cA box is drawn around each of the areas that are determined to contain text information.xe2x80x9d The system then decodes the image and extracts text information in the box. xe2x80x9cThe operator can then transfer the identified text data within selected boxes into the text information file by touching a selected box to fill in a template field that is overlayed on the display. The template field continues to prompt the user to select a box for each of the fields in the text information file.xe2x80x9d Col. 10, lines 26-43.
U.S. Pat. No. 6,134,338 to Solberg et al. discloses a Computer System for Converting Documents Bearing Symbols and Alphanumeric Text. Solberg ""338 states that it is particularly well suited to converting a raster image of a scanned hard copy source document bearing a drawing view of the three dimensional object, symbols and alphanumeric text relating to height, width, length, depth and angle of edges of the three dimensional object into mathematically accurate vector computer drawing files which files are based on the symbols in alphanumeric text scanned from the source document. The automated system includes the use of commercially available text recognition software for automatic conversion of raster text, including handwritten, upper and lower case text, and rotated text into AUTO CAD text strings. Col. 15, lines 47-52. The disclose states that after alphanumeric text has been recognized, the user can edit the resulting ASCII text as necessary. The preset optical character recognition parameters can automatically subject questionable text recognitions to user review. Col. 27, lines 55-59. In step 4.3 (FIG. 1A), alphanumeric text 180 and floating viewport 242 is recognized. Col. 28, lines 6-8. If necessary, the user can edit the text to correct OCR recognition errors. Col. 28, lines 12-14. In step 4.3 and 4.4, the OCR program creates a text file obtained from the scanned document. Col. 29, lines 57-60. See also FIG. 2 and step 4.
U.S. Pat. No. 5,970,170 to Kadashevich et al. discloses a Character Recognition System for Scanned and Real Time Handwritten Characters. The disclosure in the Kadashevich ""170 relates to detecting and decoding handwritten characters including an image processor connecting to a document scanner which receives the scanned image of a previously created document and generating one or more ordered cluster arrays. These ordered cluster arrays contain spatially ordered coordinate arrays of skeletal image arcs representing handwritten characters. The handwritten characters are processed using several techniques. Kadashevich ""170 does not seem to show the use of applying one of a plurality of forms to a scanned image, extracting data based on the forms and the scanned image, validating the data and that outputting the data is a delimited file.
The prior references do not show a single, computerized, unitary device which scans a number of documents imprinted with data, compares each document to previously stored form documents, selects one of the form documents, lifts or extracts the data from the input document having imprinted data thereon, and decodes that data based upon previously stored data characteristics. Further, the prior art devices do not show, in a single, unitary, computer system, extracting the data from the document having imprinted data thereon without regard to orientation of the input document and outputting the decoded data as a delimited string of characters or a delimited data field.
It is an object of the present invention to provide a document scanner and a system and a method which enables the user to scan a document imprinted with data, extract that data based upon a match between the scanned document and a plurality of form documents, decode the data, disregard misalignment or smudges on the input document image, and output decoded data as a delimited string of decoded characters to a further computerized device via a common computer communications link, all encompassed within a single computerized machine or system.
It is another object of the present invention to provide a document scanner, a system and a method wherein the operator can easily scan a form document, locate the important data fields on that scanned form document, size the data image field, identify data type for that data field, identify the presence or absence of any data validation parameters, identify the presence or absence of any data error reporting and data correction routines or parameters, and identify a data output destination for that decoded data in a delimited string of decoded characters output to a further computer device.
It is another object of the present invention to provide an operator interface for the identification of data field descriptors, those descriptors including positional information for the data field, data field size, data type, data validation, error reporting and correction, and data output destination information.
It is an additional object of the present invention to incorporate the scanning, operator input, identification and storing of data field descriptors, comparison of scanned document image with a plurality of forms, selection of one of the forms, extraction data from a data field, decoding and validating that data, reporting and correcting the data and outputting the decoded data all in a single, stand alone, unitary, computerized system.
The document scanner, system and method operates in conjunction with a document imprinted with data and a plurality of form documents adapted to have data imprinted thereon. The form document or documents have at least one data image field. Typically, the form document has many data image fields. Ultimately, the document scanner, system and method outputs a delimited string of decoded characters to another computer system via a common computer communications link such as a serial, ethernet, USB, SCSI, parallel interface, etc. The method includes either scanning a form document to obtain positional information of the data field on the form document or inputting topological description for the data image field for the form document. The topological description includes at least positional information for the data field. Typically the operator, in conjunction with an operator interface, identifies data field descriptors for each data field. The data field descriptors include data field size information, data type information, the presence or absence of data validation parameters, the presence or absence of data error reporting and data correction routines or parameters, and data output destination information. The data output destination information locates the decoded data in a certain sequence location in the delimited string of decoded characters output to the further computer system. The document scanner, system and method scans the document imprinted with data and captures an image thereof. The scanned input document image is compared with the stored forms, and particularly the stored data field descriptors utilizing positional, data field size and data type information. The system selects one of the stored forms corresponding to the scanned image. The document scanner, system and method then extracts the data from the scanned image based upon positional information for each data field, decodes the information based upon the data field descriptors, and validates the data (in the presence of data validation parameters) and stores the decoded data. A data error reporting and data correction system, activated in the presence of the data error reporting and correction descriptor, enables the operator to correct any errors ascertained from the extracted and decoded data from the data image fields. The document scanner, system and method outputs, via a compiler, the decoded data as part of the delimited string of decoded characters based upon the data output destination information. The scanner, system and method also includes a deskewing routine which accommodates and corrects for any misalignment in the initially scanned input document with respect to the positional information for each data image field. Further, the system includes a noise reduction system which attenuates smudges or imperfections in the data fields.