A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates in general to the field of data processing and, more particularly, to the processing and validation of postal address information.
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as xe2x80x9crecordsxe2x80x9d having xe2x80x9cfieldsxe2x80x9d of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system, or DBMS, is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing, or even caring, about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., An Introduction to Database Systems, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
One particular need in the area of database processing is used for address validation. This is the process of ensuring or verifying that a particular postal address (e.g., provided to a database system) actually existsxe2x80x94that is, is valid. In general, the notion of validating postal addresses is not new. For example, a U.S.-specific system (e.g., U.S. Postal Service database) provides a validation database system for checking U.S. postal addresses. However, that system is focused on one country, the United States.
Such a U.S.-specific (or other country-specific) validation system is of no use for validating postal addresses for other countries. This stems in part from the fact that the U.S. system does not maintain a database of foreign addresses. For example, the U.S. system does not store appropriate information for validating Canadian postal addresses. However, there are other inherent limitations. For instance, a postal address may employ a different character set than would be valid for U.S. addresses. For a Chinese address, for example, a Chinese character set may be employed; for a Japanese address, a Japanese character set may be employed; for a Korean address, a Korean character set may be employed; and so forth and so on.
Other problems exist with a country-specific system. For example, although one ordinarily expects to find addresses mailed in a particular address format (e.g., name, followed by building number, street name, building sub-unit, followed by city, state, and zip code) for a U.S.-based system, postal addresses for other countries do not necessarily adopt the U.S. format for arranging address information. For example, the sequence of fields in a Russian street address is the reverse of a U.S. address sequence since it begins with the postal code, city, then street name, building number, and building sub-unit. However, other countries have even more peculiar formats. In Asia, such as in Japan, there is no global address structure based on street number and street name. Apart from its handling of data, the system itself is not localized and, thus, it is not capable of providing meaningful error messages for a given locale (e.g., what address information is missing for a particular country-specific postal address).
Another problem with existing validation systems is that they are not fully automated to the extent that users expect. Such systems do not provide any information as to why an address is invalid, such as which required fields are missing, or which fields contain invalid data (e.g., non-existent cities) or inconsistent data (e.g., using a postal code for Boston, Mass. for San Francisco, Calif.). Additionally, these systems may return a list of actual addresses that are xe2x80x9cclosedxe2x80x9d in some way to the input address and leave it to users to determine the correct form. If a user misspells a city name or enters an unrecognized abbreviation, such a system would return a list of addresses with different cities or abbreviations for users to select. This is prone to fraud since, for example, a user could select a real address for billing that is not his or her own, thereby leading to disputes between creditors and occupants when payments are due. Automatic correction, or facilities to simplify manual correction, are useful but must not lead to fraud.
A particular problem is that a number of existing systems provide no free-form searching of the database for a particular part of an address. Instead, a user must first create an address to test. On occasion, however, a user might have the need to search or browse through a database looking for valid addresses without having to create a specific xe2x80x9ctestxe2x80x9d address first. Consider, for instance, a situation where the user has misspelled a city name. In such a case, an existing system may simply flag the city name as ambiguous and, unfortunately, not offer any further help that would allow the user to resolve the correct spelling of the city name. In particular, that system does not provide any browsing capability that would allow the user to select the correct city name. In another example, suppose a user enters an abbreviation that the system does not recognize. In an existing system, the abbreviation may simply be flagged as ambiguous with little or no effort made to facilitate automatic correction of the abbreviation. What the user would prefer instead is a system that provides automatic correction, or at least automated facilities, that would greatly simplify the user""s task of correcting invalid address information.
As yet another problem with the existing prior art validation systems is the inefficiency encountered in performing actual address comparisons. When dealing with large numbers of data records (e.g., in the billions), field-by-field comparison operations can be computationally expensive, thereby robbing a system of its performance. Although using standard database comparison operations for looking up address information may yield acceptable performance for validating U.S.-only addresses, that approach becomes painfully slow when validating worldwide address information for 190-plus countries (typically, several gigabytes of database data).
What is needed is a system that supports not only validation of addresses for a given country""s postal addresses (e.g., U.S. addresses), but also includes support for validating the postal addresses for other countries as well. Such a global validation system would not be tied to any particular character set, but would instead be flexible enough to handle any country""s address formats. The system would be able to correctly discern address information irrespective of country-specific features and be able to compare that information with known addresses in its database in an efficient manner. Additionally, the system should be able to ensure that incoming addresses meet country-specific requirements (as to acceptable address format) of the many different countries. The present invention fulfills this and other needs.
The present invention provides a system providing a grammar for the encoding of all required address fields (within an address) for a given country, so that the system can determine whether all address fields required for a country are present for a given international postal address. The present invention adopts an approach that permits international validation, even though certain addresses require application of country-specific requirements. In particular, the system stores:
(1) Required address fields (i.e., metadata describing address fields and data that must be present in all addresses for a country), for error checking;
(2) Address languages, for normalization; and
(3) Address data, for validation.
The system provides an encoding process that takes into account the variety of different languages and character encodings that an address may appear in, as well as the country-specific requirements for different address fields.
The system achieves this by employing a context-free grammar that allows address information to be expressed in a very compact format. Advantages of this approach include:
(1) Minimum storage space required;
(2) Efficient decoding and usage; and
(3) Handling of variations between different countries and languages is facilitated.
Thus, all necessary address information may be packaged in a compact data format that is easily stored and processed.
During processing of address information (stream), a signature or fingerprint is generated for each postal address. The signature is a checksum, a hash value, a message digest, or the like, which is selected to have a very high probability of being unique (i.e., minimum collision), thereby allowing the system to uniquely identify each address stream (i.e., postal address) with a terse identifier. In a corresponding manner, addresses stored in the system""s database are associated with respective signatures. By using a signature instead of comparing address field information, the system is able to quickly look up an address by quickly computing its signature and comparing that against the signatures for existing addresses in its database.
A method for assisting with validation of international postal address information includes the following steps. The method establishes an address languages encoding on a per country basis, for specifying which written languages may be employed for a given postal address. Additionally, the method establishes a required address fields encoding on a per country basis, for specifying address fields that are required to be present for a given postal address to be valid. Now, the method may proceed to process address input. Upon receiving input comprising a particular postal address to be validated, the method (a) determines which written language the particular postal address employs, based on the address languages encoding, and (b) determines whether the particular postal address includes all required address fields, based on the required address fields encoding and said determined written language.