It is common for a document that has been sent to a printer and/or placed in a copier for copying to contain a mixture of contents, i.e., machine-printed text and add-on information overlaid over the original contents. Examples of add-on information include handwritten notes, signatures, stamps, etc. (collectively “handwritten annotations”). In many cases, a user may intentionally want to print only the original contents, e.g., a letter for review without handwritten notes. However, until now it has been difficult to print such documents without including the annotations.
In a conventional computer system, as shown in FIG. 1, most printing jobs from a user application 100 are performed through a graphics device interface (GDI) 110, a printer driver 120, and a print spooler 130. In this method, an application 100 creates a document and outputs a print request. GDI 110 is an operating system component, which converts graphic calls to a format that the printer drivers understand. GDI 110 receives requests from the various user applications and sends these requests to the corresponding driver of the printer chosen by the application. From an application's point of view, there are no differences among the various printers, which simply appear as output devices. GDI 110 supports text, bitmap and graphics rendering (i.e., drawing lines, curves arcs, rectangles and color fill). This printing methodology allows applications 100 to send device independent printing commands to render text and graphics. These printing commands are then sent to printer driver 120 via GDI 110. Printer driver 120 converts the standard graphics request to those commands that a printer understands and send them to spooler 130 Finally, a printer receives the printing commands through spooler 130 and port monitor 140. Spooler 130 manages the print jobs and allocates resources for printing from the computer's CPU without interrupting any current operation. Likewise, conventional digital copying machines include similar subsystems, albeit in a more closed environment where the subsystems are fixed and not generally accessible to the user.
The prior art is limited in its ability to remove handwritten annotations from a document having machine printed text and annotations. In a first method, a marker or template is placed on the document being processed to assist in locating the handwritten annotation, see e.g., U.S. Pat. No. 5,631,9084 which places a magnetic ink character recognition line on a bank check for use in locating the handwritten signature. The first method requires special apparatus to locate the marker or template. In a second method, handwritten annotations are identified and separated from machine printed text for separate processing. However, the prior art is limited in its ability to handle complex cases where a document contains a mixture of machine printed text and handwritten annotations, and where the handwritten annotations are mixed with the machine printed text, i.e., where the handwritten annotations do not appear only in regions of white space in the original document such as margins.
Hidden Markov Models (HMM) have use in many applications. One of the most successful applications is speech recognition, but it has also been applied to optical character recognition and keyword identification. In summary, as an overview of HMM theory, it is important to note that in natural language, there is an embodied Markovian structure. For example, the probability of seeing a letter “u” after letter “q” is usually greater than seeing “u” after any of the other letters. A process that has a property that the conditional probability of a current event giving past and present events depends only on the most recent event is a Markov process. In a discrete Markov process, each state corresponds to an observable deterministic event. But in a Hidden Markov Model, the output of each state corresponds to an output probability distribution. The method of the present invention is based on the theory of Hidden Markov Models (HMM). The unknown OCR knowledge is treated as hidden states and a decision is made based upon the observation sequences that come from these states.
It is therefore an object of the present invention to provide a system and method for the automatic separation of handwritten annotations from machine printed text.
It is an additional object of this invention to provide a system and method for the automatic separation of handwritten annotations from machine printed text that is based on a Hidden Markov Model.
It is a further object of this invention to provide a system and method for the automatic separation of handwritten annotations from machine printed text embodied within a conventional digital copy machine.
It is another object of this invention to provide a method for the automatic separation of handwritten annotations from machine printed text within documents sent for printing on a conventional printer.
Various other objects, advantages and features of the present invention will become readily apparent from the ensuing detailed description and the novel features will be particularly pointed out in the appended claims.