This invention relates generally to merchant prediction systems, and more specifically, to methods and systems for implementing approximate string matching within a database in relation to joining database records contained within a bankcard network.
Historically, the use of “charge” cards for consumer transaction payments was at most regional and based on relationships between local credit issuing banks and various local merchants. The payment card industry has since evolved with the issuing banks forming associations (e.g., MasterCard) and involving third party transaction processing companies (e.g., “Merchant Acquirers”) to enable cardholders to widely use charge cards at any merchant's establishment, regardless of the merchant's banking relationship with the card issuer.
For example, FIG. 1 of the present application shows an exemplary multi-party payment card industry system for enabling payment-by-card transactions. As illustrated, the merchants and issuer do not necessarily have to have a one-to-one relationship. Yet, various scenarios exist in the payment-by-card industry today, where the card issuer has a special or customized relationship with a specific merchant, or group of merchants.
Over 25 million merchants accept a form of payment card. One of the associations houses name and address information for thousands of merchants and merchant locations in what is referred to herein as a data warehouse. At the merchant location level, there are millions of entries in this data warehouse. Many of the location entries are known to be duplicates due to fluctuations in name and/or address information in the transaction data. For example, the same street address can be written in a variety of ways, all of which are valid (e.g., 400 South Fourth Street, 400 S. Fourth St., 400 South 4th Street, etc.). Names can sometimes also be represented in a number of ways, all being valid. Current database technology is very limited in its ability to identify entries with similar field values such as name and address. Thus, many near duplicate merchant names and merchant locations are entered into the data warehouse.
In a typical processing day for the association, there are about 15,000 candidate locations (e.g., new merchant locations) that need to be checked for matches against approximately five million location entries already within the data warehouse. The checking for matches serves at least two purposes. One, locations with similar names and/or addresses can be identified as one entity, rather than several. Additionally, if the names or addresses are too different, the association can determine that an entity has moved, or that one entity has ceased operations and has been replaced by another entity.
This name and location matching problem is also encountered in several other contexts where third parties provide the association maintaining the data warehouse with transaction files and therefore lists of merchant names and address (locations) which are used to enhance and/or validate the data warehouse. In another third party example, a list of all locations for a large national retailer might be received, or lists of chain store names and addresses might be received. A team charged with maintaining the data warehouse is charged with the task of matching the list received against known locations for the retailer or chain.
One way to check for matches between the existing locations and new locations is through a string matching algorithm. Therefore, any solutions that might be utilized for string matching should be scalable within the framework of a database (the data warehouse) system. Third party solutions do exist for approximate string matching. However, these solutions typically have one or more drawbacks, including, the solution is cost prohibitive, is domain or tool specific, or the solution is external to the database (the data warehouse) system.
Therefore, there exists a heretofore unmet goal of developing a technique that would allow a data warehouse team to perform approximate name and address matching in order to match merchant records in a scalable manner within a database system. The desired result would be a compact and accurate data warehouse capable of supporting other downstream applications, for example, utilizing historical transaction data to predict future financial card transactions and determine if there are correlations to be made from the data.