Every day people become more dependent on computers to help with both work and leisure activities. Computers are especially becoming vital as a communication means, especially for written communications. Humans tend to communicate in an analog manner such as writing letters. However, computers operate in a digital domain that requires discrete states to be identified in order for information to be processed. This presents some technological issues that must be overcome when interfacing with analog based entities such as human beings. Thus, information is generally converted into “ones” and “zeroes” or “digitized” so that computing systems can recognize analog-based items and process them accordingly.
To facilitate this conversion requirement, people have been trained on devices that easily convert analog thoughts into digital information such as via typewriters, keyboards, and other discrete based devices. These means typically produce a somewhat consistently formatted product to enhance readability. To convert these products into the digital realm, they are typically scanned (converted to digital quantities) into a computing system so that the information can be stored. If the information is to be recognized by the system, it is typically processed further so that the “image” of the information is broken down into discrete recognizable parts. For example, a typewritten page can be scanned into a computer to form an image of the page. It can then be further processed so that it is broken down into individual symbols or “glyphs” that are then identified or “labeled” such that the computing system ‘recognizes’ the symbol.
As background, technology first focused on inputting existing printed or typeset information into computers. Scanners or optical imagers were utilized, at first, to digitize pictures (e.g., input images into a computing system). Once images could be digitized into a computing system, it followed that printed or typeset material should be able to be digitized also. However, an image of a scanned page cannot be manipulated as text or symbols after it is brought into a computing system because it is not “recognized” by the system, i.e., the system does not understand the page. The characters and words are “pictures” and not actually editable text or symbols.
To overcome this limitation for text, optical character recognition (OCR) technology was developed to utilize scanning technology to digitize text as an editable page. This technology worked reasonably well if a particular text font was utilized that allowed the OCR software to translate a scanned image into editable text. One of the problems with this approach is that existing OCR technology is tuned to recognize limited or finite choices of possible types of fonts in a linear sequence (i.e., a line of text). Thus, it could “recognize” a character by comparing it to a database of pre-existing fonts. Character recognition is not limited to only scan or fax type character recognition. Computing systems often internally utilize font recognition techniques to facilitate in other functions such as, for example, printing and/or converting documents from one format to another. Increasing the performance of a character recognizer thus has impacts not only on traditional types of character recognition, such as scanning, but also on other system functions as well.
With today's plethora of information, it is impracticable to have a database that contains all pre-existing fonts. If one also stores variants of these fonts, the size of the database can grow even more. Even if a database could contain these fonts, it would be so vast that it would take an extreme amount of processing power and time to identify a symbol in the database. A typical user cannot normally afford to own such computing power nor do they desire to spend hours attempting to establish character recognition. Thus, although OCR technology has made great strides in increasing its accuracy, it has not kept pace in the same manner with reducing processing time. It is also limited in that it requires known or pre-existing font sets to operate efficiently.