This invention relates to an automated search system for searching handwritten cursive records to restrict the number of documents containing these records that require a manual search. In particular, the invention relates to an automated search of a cursive written name, word or phrase in handwritten documents to assist a user in locating candidates for manual review where the candidates may be a match for a query defined name, word or phrase.
With the advent of high speed information processing systems, it is now possible to process large databases built from information originally collected on paper documents. If these documents are printed documents, automated character recognition systems have been developed that have a high probability of correctly reading printed data and converting that printed data into ASCII codes usable by the computing system. A problem arises where the characters on the documents are handwritten cursive characters.
Character recognition systems designed to recognize handwritten cursive characters are well known and have been under development now for at least three decades. At this point, one can expect a handwriting recognition system to read approximately 50% of the cursive words whose images are scanned into the computing system. The unrecognizable words must be manually examined and keyed into the computing system by operators. For low volume systems handling a few hundred documents a day, this is not a problem. However, for large database systems dealing with hundreds of millions of documents, the manual examination of the documents followed by key entry of the information on those documents is not an acceptable alternative.
For example, in a database system maintaining genealogical records, it would be desirable to be able to scan images of census records and read the individual names on these records. Most of these census documents contain handwritten cursive records. Billions of documents have been collected over many centuries of keeping such records around the world. If, for example, there are documents containing two billion handwritten cursive census records, and if manually reading and keying in records can be done at the rate of two million records a year, it would take one thousand years to manually enter all of the handwritten cursive record information on these documents. Even applying the best cursive character recognition technology available at this time, which is 50% successful, the number of records to be manually entered is only cut in half. To complete the task of manually entering these records into the computing system, the number of years in this example is reduced only from one thousand years to five hundred years.
In accordance with this invention, the above and other problems have been solved by extracting images of the cursive records, performing an automated search on the images of the cursive records based on an ASCII query for a record, and matching a cursive equivalent of the ASCII query to the images of the cursive records. A similarity value is generated indicating the extent of match between features of the cursive equivalent of the ASCII query and features of each cursive record. Finally, the records are sorted based upon their similarity value in the matching process to provide a candidate list of cursive record images to be manually examined by a user for the purpose of making a final determination as to whether any of the cursive records on the candidate list satisfy the query. For sake of simplicity, in describing the invention, each cursive record, or a record that is the subject of a query in the search of cursive records, will be referred to herein as a xe2x80x9csnippet.xe2x80x9d A snippet shall mean an individual""s name (partial or fill), or a word or a series of words making up a phrase.
The foregoing and other features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention as illustrated in the accompanying drawings.