Embodiments herein generally relate to an approach to score normalization for Hidden Markov Model (HMM) and other similar handwritten word-spotting systems.
Score normalization techniques are known to significantly improve the performance of HMM-based systems on different problems such as handwriting recognition or speech recognition. Embodiments herein produce a very significant increase of the performance for continuous HMMs and a dramatic improvement for semi-continuous HMMs (SC-HMMs).
Word spotting is important to many organization that offer services, such as handling customer correspondence and corresponding processes for clients. For example, an organization may receive customers' letters. In their letters, customers can ask for different services and actions, e.g. resign a contract, take over another customer's contract, etc. Spotting characteristic words within a customer letter can allow automatic dispatching of the letter to an appropriate department or the production of an automated response, including (e.g. natural language processing (NLP) techniques) to completely automate letter processing. This allows immediate processing of correspondence deemed urgent. For example, in the case of Customer Relationship Management, the earlier an unhappy customer is contacted, the higher the chances of getting this customer back.
Other requests for services related to word spotting, can include a government official wanting to spot specific topics in their incoming correspondence (e.g., politically important topics, such as global warming). In another case, the government agency may want to detect letters that require a very quick response to match various Service Level Agreement requirements. Also, banks want to be able to detect the special terms or acceptances on contracts.
In the domain of Business Process Outsourcing and more particularly Imaging, one of the most common, if not ubiquitous, tasks is the categorization of documents. Many of these documents are received in paper form, either because of their “legal” significance, as a backlog of old documents to be archived, or as general-purpose correspondence; and they need to be classified. Various techniques exist for classifying, whether based on the aspect of documents, on the textual content, or based on templates. All these techniques have their specific advantages and drawbacks.
In practice however, the categorization task is very often performed by human operators quickly scanning the document for a few specific words (e.g. “Housing Benefits”, “Deed of Trust”, etc.) rather than reading the whole document. More complex classification techniques could be applied (e.g. based on optical character recognition (OCR) for typed documents) but constitute an inefficient use of resources when compared to the more efficient word spotting.
In view of the foregoing, the present embodiments provide a method that begins by receiving an image of a handwritten item. The method performs a word segmentation process on the image to produce a set of sub-images and extracts a set of feature vectors from each sub-image. Then, the method performs an asymmetric approach that computes a first log-likelihood score of the feature vectors using a word model having a first structure (such as one comprising a Hidden Markov Model (HMM)) and also computes a second log-likelihood score of the feature vectors using a background model having a second structure (such as one comprising a Gaussian Mixture Model (GMM)). The method computes a final score for the sub-image by subtracting the second log-likelihood score from the first log-likelihood score. The final score is then compared against a predetermined standard to produce a word identification result and the word identification result is output.
As mentioned above, the first and second structures are different. For example, the first structure can comprise a different number of states than the second structure or can be based on different models. Further, the computing of the first log-likelihood and the second log-likelihood scores have relatively lower computational costs when the second structure is simpler than the first structure, and relatively higher computational costs when second structure is the same as the first structure. Therefore, thanks to the potentially simpler structure of the asymmetric approach, the embodiments herein have a significantly lower computational cost at both training and test time.
The computing of the first log-likelihood score and the computing of the second log-likelihood score can comprise continuous HMM or semi-continuous HMM. The continuous HMM uses different sets of Gaussian functions for the word model and the background model, and the semi-continuous HMM uses the same set of Gaussian functions for the word model and the background model.
A system embodiment herein comprises an input/output device adapted to receive the image of the handwritten item. A word segmenter is operatively connected to the input/output device. The word segmenter is adapted to segment the image so as to produce the sub-image. An extractor is operatively connected to the word segmenter, and the extractor is adapted to extract the set of feature vectors from the sub-image.
A processor is operatively connected to the other elements. The processor is adapted to compute the first log-likelihood score of the feature vectors using the word model having the first structure (e.g., comprising a Hidden Markov Model (HMM)) and compute the second log-likelihood score of the feature vectors using the background model having the second structure (e.g., comprising a Gaussian Mixture Model (GMM)). The processor is further adapted to compute the final score for the sub-image by subtracting the second log-likelihood score from the first log-likelihood score.
A comparator is operatively connected to the processor. The comparator is adapted to compare the final score against the predetermined standard to produce the word identification result. The input/output device is further operatively connected to the comparator and is adapted to output the word identification result.
During the computing of the first log-likelihood score and the computing of the second log-likelihood score, the processor is further adapted to employ either continuous HMM or semi-continuous HMM, as described above. These and other features are described in, or are apparent from, the following detailed description.