Computer software can be used to recognize digital representations of objects. For example, optical character recognition software can be used to recognize digital representations of character objects, typically obtained by scanning a printed page, segmenting the page into characters, and identifying characteristics of each character. Rules are used to narrow the choice of characters to a smaller range of characters, and a confidence level is assigned to each character in the smaller range. The character with the highest confidence level may be selected as the recognized character.
Some computer software for object recognition uses parameters to allow the software to be adjusted. The use of parameters allows the software to be tuned in a laboratory to particular conditions simulating the environment of anticipated operation of the software. Before the software is shipped as part of a product, the parameters are fixed at a constant level that yielded the optimum recognition in the laboratory simulation for that product.
For example, if a scanned image represents the image using pixels, each having a greyscale value of 0–255, one parameter of the optical character recognition software may be to identify which values correspond to a part of the image to be recognized, in order to distinguish that part of the document from the greyscale value of the background. For example, a document received via a fax that is photocopied onto off-white paper may have text that has a greyscale reading of 200, while the remainder of the page may have a greyscale reading of 100. A printed black and white document may have a greyscale reading of 240 for text and 30 for the remainder of the page. Text on a printed color document may have a greyscale reading as low as 90 with a greyscale reading of 70 for portions of the background. These various values may be used to determine that an optimal cutoff greyscale reading of 150 should be used for the software. While this value provides a good compromise for high-contrast documents such as most black and white documents, certain color text on color background documents simply will not be recognized with this parameter value. If the parameter were lowered to 80 to accommodate recognition of color documents, some black and white documents would not be recognized, such as the fax photocopied onto off white paper.
It would be desirable to have the parameter selection process vary for each set of objects, such as characters on the page, rather than selecting a single value for each parameter and using that same value for all objects. This would allow the parameter values to change for every page or part of a page, causing the parameters to be optimized for every circumstance. In the example above, it would be desirable to use a greyscale threshold of 150 for the faxed document and a threshold of 80 for the color document, instead of using a value of 150 every time.
While it is possible to make several attempts at recognizing the objects, such as characters in the file, using different parameters for each attempt, and then selecting the attempt that yields the highest recognition confidence, such a process would add too much time to the recognition process to be practical. Although computing power increases every year, because users prefer to use the additional computing power to process images of higher resolution rather than improve the accuracy of the recognition, making several attempts at recognizing an image could take too long to be useful.
What is needed is a method and apparatus that can optimally set the parameters of an optical recognition without significantly adding time to the recognition.