A. Field of the Invention
The present invention relates to an improved method for processing data from scanned documents and more particularly to the use of a simple hardware filter to extract from a document only user-designated information. This information can be transmitted over a data communications network for processing by remote stations on the network, or it may be processed by a local station connected to the scanner.
B. Related Applications
This patent application is related to U.S. Pat. No. 5,375,070, which issued on Dec. 20, 1994 from application Ser. No. 08/024,572, which was filed Mar. 1, 1993, entitled "Information Collection Architecture and Method for a Data Communications Network," by J. G. Waclawsky, Paul C. Hershey, Kenneth J. Barker and Charles S. Lingafelt, Sr., assigned to the IBM Corporation and incorporated herein by reference.
This patent application is related to U.S. Pat. No. 5,365,514, which issued on Nov. 15, 1994 from application Ser. No. 08/024,563, which was filed Mar. 1, 1993, entitled "Event Driven Interface for a System for Monitoring and Controlling a Data Communications Network," by Paul C. Hershey, J. G. Waclawsky, Kenneth J. Barker and Charles S. Lingafelt, Sr., assigned to the IBM Corporation and incorporated herein by reference.
This patent application is related to U.S. Pat. No. 5,493,689, which issued on Feb. 20, 1996 from application Ser. No. 08/024,542, which was filed Mar. 1, 1993, entitled "System and Method for Configuring an Event Driven Interface and Analyzing Its Output for Monitoring and Controlling a Data Communications Network," by J. G. Waclawsky and Paul C. Hershey.
C. Background Art
A typical optical scanner uses photosensors to scan the text of a document and complex character recognition software to transform the scanned document in pixel formation into a computer compatible digital code. This computer compatible code is commonly an unstructured file, which can then be manipulated with word processing software. However, subsequent manipulation with a word processor requires human intervention which is slow and prone to error.
A scanner reads all of the printed matter which appears on a document and puts it all into the unstructured file. For example, if a completed questionnaire or other form is scanned, not only is the information entered into the form stored in memory, but also the questions, prompts and other extraneous matter of the form.
In addition to having the capability to manipulate a scanned document at the scanning site with word processing software, one can transmit scanned document data over a data communications network. There are, however, some shortcomings associated with the transmission of a scanned document over a network. Since all of the printed matter from the scanned document (user designated and user extraneous) is stored, transmission of that information over a data communications network may take up more bandwidth than is necessary. Furthermore, because the scanned data file is unstructured, it is a task involving more complex software and many processor cycles to extract from the scanned data the information desired when it arrives at a destination station on the network. Consequently, extraction of desired information from a scanned document transmitted over a network cannot be accomplished in real time at high network speeds in the prior art.
It is an object of the present invention to identify and extract, in an online process, user-designated information scanned from a document.
It is another object of the invention to transmit this designated information over a data communications network for remote retrieval and analysis in real time.