Many large-scale software systems currently deployed incorporate a database subsystem which stores information such as: customer lists, including names, addresses, and phone numbers; an inventory of equipment; supply houses; lists of parts; or scheduling and routing information. One telephony database system, which is illustrative of an application of the present invention, is a mainframe system which provides on-line support for development, assignment, and maintenance of building location information. Specifically, the building locations are identified by universal codes which uniquely identify buildings that contain telecommunications equipment. Such codes are accessible by the local operating telephone companies, long distance telephone companies, and telecommunications equipment vendors. The building location database presently contains more than one million records for telecommunications building locations in 73,000 cities and for 175,000 buildings. More than 44,000 new buildings are entered into the database each year.
The codes can be entered by any company that has or will be installing telephone equipment in a given building. This leads to a potential problem of duplicate codes being entered into the database. Typically a database user is provided with the capability of checking if the address of a building is already in the database. However, if the address is not entered correctly or exactly as it appears in the database, it may not be found, and a duplicate code would then be created. Each duplicate location code then results in up to tens of thousands of incorrect subsidiary records being generated. Correcting these errors is a time-consuming and costly data purification effort. Oftentimes, the database is never fully corrected, thereby resulting in what is commonly referred to as a "noisy" database.
Conventionally, various computer algorithmic techniques have been employed in an attempt to check on the correctness of an entered building address. However, these techniques have been static in the sense that they have little ability to learn about or adapt to the underlying semantic structure or expressiveness of the database. Also, the algorithms are usually deterministic in the sense that formulae are developed, oftentimes heuristically, assuming that the database conforms to certain underlying parameters and characteristics.
Recently, neural networks have been applied to problem areas or situations related to noise-corrupted data, information retrieval, and data classification. For instance, neural networks have been used to construct many different types of classifiers in such diverse fields as speech recognition and image processing. However, the application of simple neural networks (neural networks of one type only), or even more complex neural networks (combinations of simple neural networks), to the retrieval of information from large databases using textual retrieval keys wherein either the retrieval key or the data in the database is "noisy" has generally proven to be ineffective. For instance, the application of these neural network structures to the problem of identifying and correcting for inadvertent errors in building addresses never achieved the required objective of building address retrieval accuracy of at least eighty percent (80%).