1. Technical Field.
The following disclosure relates generally to the field of data mining and relational database management systems and, more particularly, to a system for creating and maintaining a database of address and related information about a plurality of discrete locations.
2. Description of Related Art.
The database has been a staple of computing since the beginning of the digital era. A database refers generally to one or more large, structured sets of persistent data, usually associated with a software system to create, update, and query the data.
The relational database model was described in the early 1970s. In a relational database, the data is stored in a table. A table organizes the data into rows and columns, providing a specific location (such as row x, column y) for each field. Each row contains a single record. The columns are arranged in order, by attribute, so all the fields in each column contain the same type of data. The table format for a database file makes searching and accessing data faster and more efficient. The records (rows) can also be sorted into a new order, based on any one or more of the columns (fields). Sorting is often used to order the records such that the most desired data appears earlier in the file, thereby making searching faster. As computing speed and capacity increased, database tables were able to store larger amounts of data.
A database management system refers generally to an interface and one or computer software programs specifically designed to manage and manipulate the information in a database. The database management system may include a complex suite of software programs that control the organization, storage, and retrieval of data, as well as the security and integrity of the database. The database management system may also include an interface, for accepting requests for data from external applications. In a relational database including multiple tables, the database management system is generally responsible for maintaining all the links between and among key fields in the various tables. This is referred to as maintaining the “referential integrity” of the database.
Address Databases: The United States includes more than 145 million deliverable addresses. Address databases are available from private commercial sources or from government sources, such as the U.S. Postal Service (USPS). The USPS offers a variety of address databases to the public, including a City-State file, a Five-Digit ZIP file, and a ZIP+4 file. Because of growth and changes in population, address databases generally require frequent updating. As with any other large database, updating the data in a very large address database is often technically challenging and time-consuming. Many private companies build and maintain their own database of addresses, which can be updated using any of a variety of data sources.
Address standardization transforms a given address into the best format for meeting governmental guidelines, such as those established by the USPS. Standardization affects all components of the delivery address, including the format, font, spacing, typeface, punctuation, and ZIP code or delivery point bar code (DPBC). For example, a non-standard address may look quite different after standardization.
A parcel or letter can usually be delivered whether it bears the standardized address or not. Although USPS regulations encourage and educate mail senders about address standardization, no agency or company can expect to manage or enforce address formats. The capacity to handle and deliver a parcel or letter bearing a non-standard address format is an advantage to senders and receivers, but often represents a serious disadvantage to those attempting to maintain an accurate address database.
The existence of multiple representations for the same address represents one of the primary challenges in developing and maintaining an accurate and current database of deliverable addresses. The example above shows two non-standard addresses that refer to a single address. In a system like the U.S. Postal Service or a major parcel delivery company, there may be dozens of non-standard addresses accumulated over time—all of which refer to a single address at a discrete location.
Thus, there is a need in the art for a system that can uniquely identify a discrete address location based upon any kind of non-standard address indicia. There is also a need in the art for an improved database management system capable of creating and maintaining a database of address and related information about a group of discrete locations.
There is a related need to identify and store a single preferred address for each discrete physical location, while also identifying and storing any non-standard address that refers or relates to that discrete location, and providing a link to the preferred address.
There is also a need in the art for a database management system that is capable of continually monitoring the accuracy of an address database as new non-standard addresses enter the system.