The present invention relates generally to optical character recognition, and, more particularly, to a system and method for accurately recognizing text font in a document processing system.
Optical recognition technology has been employed in scanning devices in order to scan a printed document and convert it into an electronically editable format. Typically, a user will scan a document with a scanner that is attached to a computing device. The computing device includes software that interprets the scanned image, converts the image to a text file and then presents the text to the user in an editable electronic form.
Some OCR engines that are palt of some OCR systems include a limited degree of font recognition capability. The ones that do typically only select between a few fonts based upon whether the text is fixed pitch or proportional, and whether the text has serifs or is sans-serif. One drawback of such a system is that it prevents the reproduction of the original document in a manner that fully preserves the original layout of the document. This is so because different fonts have different characteristics, other than just serif vs. sans-serif and proportional vs. fixed pitch.
Some of the differences that can occur between fonts are: average character width, which determines the amount of space consumed by a piece of text; x-height/cap ratio, which determines whether the text has a crowded or spaced out appearance; size and shape of serifs, which determines the style or the age of the font; stroke weight, which determines the overall color of the text; and stroke weight variation, which determines the style of the text.
In the past, some OCR engines have employed pattern recognition to recognize characters in an image, but have had limited success in recognizing and identifying a particular text font. Furthermore, the reliance of a scanning system on the OCR character recognition capability severely hinders the operation of the system as a whole.
In copending and commonly assigned U.S. patent application Ser. No. 09/258,416, entitled, xe2x80x9cSYSTEM AND METHOD FOR DETERMINING TEXT FONT IN AN OCR SYSTEMxe2x80x9d, filed on even date herewith, a system is described which uses font metrics to find a font that takes up approximately the same horizontal space as the original. Such a system achieves very high speed recognition rates, but may yield limited accuracy.
Therefore, it would be desirable to provide a way of accurately recognizing the font of text in a scanned document so that the document may be accurately reproduced.
The invention provides a system and method for quickly determining the text font in a text image.
In architecture, the present invention may be conceptualized as a system for recognizing the font of text in a document processing system, comprising a computer system having a memory in which a plurality of fonts are contained, the computer system also including a document processing system, means for receiving an image in the document processing system, the image including a plurality of text characters representing a font in the image, means for capturing each text character, each text character defined by a bitmap, means for comparing the bitmap of the captured text character with a bitmap of each of the plurality of fonts contained in the computer system, and means for selecting from the plurality of fonts from the memory the font that most closely matches the font in the image.
The present invention may also be conceptualized as providing a method for recognizing the font of text in an image, comprising the steps of: storing a plurality of fonts in a computer system having a memory, the computer system also including a document processing system, receiving an image in the document processing system, the image including a plurality of text characters representing a font in the image, capturing each text character, obtaining a bitmap for each text character, comparing the bitmap of the captured text character with a bitmap of each of the plurality of fonts contained in the computer system, and selecting from the plurality of fonts from the memory the font that most closely matches the font in the image.
The invention has numerous advantages, a few of which are delineated, hereafter, as merely examples.
An advantage of the invention is that the fonts chosen are typically those that most closely match the font in the original image.
Another advantage of the invention is that it uses the fonts available on a host computer system in order to select a font that most closely matches that in the original image.
An advantage of the invention is that it allows for the accurate recognition of text font in a text image.
Another advantage of the invention is that allows for the recognition of text font using minimal information from an optical character recognition engine.
Another advantage of the invention is that it may be implemented independently from an optical character recognition engine.
Another advantage of the invention is that it is simple in design and easily implemented on a mass scale for commercial production.
Other features and advantages of the invention will become apparent to one with skill in the art upon examination of the following drawings and detailed description. These additional features and advantages are intended to be included herein within the scope of the present invention.