1. Field of the Invention
The present invention relates to a pattern recognition system that, in one embodiment, may be used in a document analysis system.
2. Description of the Prior Art
Pattern recognition techniques are often used to classify an unknown item into one of many predefined sets. Examples of technologies that provide these capabilities are:
neural networks PA1 binary trees PA1 K-Means methods PA1 accurately locating the important fields on the image of a financial document, such as a business check, personal check, deposit slip, or giro; PA1 determining the likelihood that connected regions correspond to people in images of a retail or banking scene; and PA1 differentiating between a hand, objects in a hand, and background objects in the image of a retail self-checkout scene. PA1 Both feature parameters and confidence parameters are trained automatically from ground truth image data; PA1 Large numbers of different attributes can be analyzed in the model, which yields more accurate modeling; PA1 Relationships between different features can also be trained from ground truth data and represented in parametric form; PA1 Insensitivity to feature noise; PA1 Parameters are based exclusively on statistical information. The need for heuristic information is minimized; PA1 Retraining is straightforward and fast; PA1 On-line updating of model parameters is possible; and PA1 The models created are accurate and reliable at predicting the match of an unknown item to the model.
These techniques, called "Classifiers" have been employed in several recent technologies, including Optical Character Recognition, Vehicle Identification, and Scene Analysis.
Pattern recognition techniques can also be used to determine the likelihood that an unknown item belongs to a predefined class of items. These techniques extract features of the unknown item and compare these features to an existing model of the predefined class. The closer the features are to the model, the higher the likelihood that the unknown item belongs to the predefined class.
Pattern recognition techniques may also be used for the purpose of analyzing document images. For example, the ability to automate the process of data extraction from digital images of paper greatly increases the productivity and capacity of any business. By automating the data entry process, operating costs can be reduced due to reduced manpower requirements and capacity can be increased by maximizing the throughput of data entry operations.
One such document analysis system that has been developed is described in co-pending U.S. patent application Ser. No. 08/652,283, filed May 22, 1996, and entitled "Knowledge-Based Document Analysis System". The pattern recognition approach taken in the system described in that patent application is to analyze document format and content by exploiting global document attributes, and to only use local document information when necessary. While the pattern recognition technique described in that patent application provides a robust solution that is highly country and language independent, this prior pattern recognition approach suffers from a number of disadvantages.