Field of the Invention
Embodiments of the present invention generally relate to data analysis and, more specifically, to resolving similar entities from a transaction database.
Description of the Related Art
Financial institutions store transactional data for analysis. A financial institution generates transactional data from credit and debit card purchases at companies that have a merchant account with the financial institution. The merchant account may be used to processes individual credit or debit card purchases. In turn, each such purchase is stored as a transaction record in a transaction database. A transaction record associated with a particular merchant account oftentimes includes a merchant ID attribute that links the transaction record to the merchant account. A merchant ID may be any data type, including a number, a string, or some combination thereof. The financial institution may then analyze the transaction records from one or more merchant accounts. For example, an analysis may involve aggregating the transaction records of a merchant account or particular merchant accounts. The analysis may then compare the performance of the merchant account to that of competing merchant accounts in the same geographic area.
Although the financial institution stores the transaction records in a database of transactions, certain analysis may require the data to be organized in ways that are not part of the transaction records in the database. These databases contain sets of transaction records that an analysis should group together, even though there is no single attribute value that relates the transaction records. For example, if a financial institution configures a database of transactions with a merchant ID attribute that links each transaction record to a merchant account, then an analysis would easily aggregate transaction records with the same merchant ID together. However, a single company may have multiple merchant accounts with a financial institution. If the financial institution provides distinct merchant IDs for every merchant account, even when multiple merchant accounts belong to a single company, then it is difficult to aggregate transaction records together from the multiple merchant accounts of that company. For instance, a franchise company may have distinct merchant accounts with distinct merchant IDs for each franchisee location. In such a case, an analysis could not aggregate the transaction records of the franchise company together based on identical merchant IDs alone. Instead, an analysis can use similarities between the merchant ID attribute values to aggregate the transaction records of the franchise company together.
Existing techniques rely upon simple tests, such as string comparisons between an attribute in a database of transaction records to detect similarities between groups of transaction records. Transaction records including attribute strings that meet a measure of similarity are then aggregated together for analysis. These techniques may work as long as the attribute contains strings that are identical or similar for groups of transaction records that should be aggregated together and strings that are distinct for groups of transaction records that should not be aggregated together.
However, such identifiers are not always (or even usually) available. For example, different merchant IDs for the merchant accounts of a single company may prevent an analysis system from aggregating the transaction records of the company together. Furthermore, transaction records may contain similar identifiers that an analysis system may base aggregations upon, even if the transaction records should not be aggregated together. For example, two different companies may have merchant accounts with similar merchant IDs, which an analysis system could mistakenly match to one company. The analysis system may then mistakenly aggregate the transaction records of the two companies together.
As the foregoing illustrates, there remains a need for more effective techniques evaluating financial transaction records.