The present invention relates to a block analysis engine for processing blocks of text data, and particularly to a modular analysis engine for a mail sorting system.
Every day, millions of items of mail pass through national and international mail systems. If the mail items are to successfully pass from their point of posting to their intended destination, each mail item must be individually sorted and directed in dependence on the address indicated on the mail item. Conventional mail sorting systems rely on a combination of machine sorting and sorting by hand.
Machine sorting is much faster than sorting by hand and allows a sorting office to handle large volumes of mail more efficiently. Machine sorting is therefore generally preferable to sorting by hand. However, in certain situations, hand sorting by sorting office employees is required with conventional systems. One such situation occurs when address information on a mail item cannot be understood by a sorting machine.
In conventional sorting systems, errors and omissions in the address information of a mail item can prevent a sorting machine from correlating a mail item with the correct delivery point. Problems with address information can include: mis-spelled words, use of a non-standard address form, changes of company name, changes of surname, and new building names or building divisions. Because of the increased speed of machine sorting, it is advantageous if the sorting system can adapt to these problems so that a greater proportion of mail items can be correctly correlated with their intended delivery point.
U.S. Pat. No. 6,954,729 (hereinafter, the '729 patent) describes an address learning system embodying a computerized method for correlating unmatched and/or unused text strings from a mail item according to a set of predetermined rules so as to allow the correct delivery of future mail items carrying those unmatched and/or unused text strings on the basis of those text strings. In other words, the system “learns” the delivery point which is indicated by the previously unrecognized text strings.
The '729 patent describes a conventional learning system in which address data is captured as a text string from a mail item and compared with data in an address database in order to determine a match with the captured address data and correctly route the item. Unmatched or unused address data of a mail item which is correctly routed to its destination is stored as a learning candidate for later correlation with that destination, should the data prove to be suitable for promotion into the address database. If a match is not found in the address database for a subsequent item of post, this system allows unmatched or unused address data which has been promoted into the address database to be used to correctly route the item.
A problem with the system is that the number of stored learning candidates grows rapidly as mail passes through the system, and continues to grow as further candidates are added. In order for a learning candidate to be promoted into the address database, the system searches the learning candidates for candidates which reinforce one another and indicate that the unmatched/unused address data is a reliable indicator of the destination address. Since the stored learning candidates are a set of text strings growing continuously in number the search process is highly intensive and requires that a particular learning candidate is compared against all other learning candidates of its type. The processing time required to perform a search for a match for a particular learning candidate grows factorially with the number of stored learning candidates.
Conventional mail sorting systems either do not allow the routing of mail items with addresses which do not match the addresses stored in the routing database, or the systems have some ability to learn the delivery point associated with unmatched addresses but that ability does not scale well with the number of mail items passing through the system. Furthermore, the matching operations performed by conventional mail sorting systems are limited to the comparison of address text strings.
Accordingly, there is a need for a scalable mail sorting system operable to analyze the address information captured from mail items and provide data in a form suitable for further processing, including routing mail items in dependence on the data.