This invention relates generally to financial transaction tracking software. More particularly, the invention provides techniques for automatically categorizing a financial transaction by examining the alphanumeric characters describing the transaction and mapping useful characters from the description to a financial category for the transaction.
Electronic representations of financial transactions often contain a string of alphanumeric characters that describe the transaction. For instance, FIG. 3 depicts sample transactions as they typically appear on a person""s monthly credit card account statement. The data contained in FIG. 3 was taken from actual credit card account statements.
A useful feature of financial transaction tracking software, such as Microsoft Money 2000, is that reports may be generated, spending habits may be analyzed, and compliance with budgets may be reviewed once a person""s, family""s, or business""s expenditures have been categorized. Conventionally, it has been necessary to manually enter categories for each transaction in order to take advantage of these useful features of financial transaction tracking software. Even for an individual or family with relatively few such transactions to categorize, this is an extremely time-intensive process.
U.S. Pat. No. 5,842,185 issued to Chancey et al. purports to use data such as that shown in the column labeled xe2x80x9cReferencexe2x80x9d in FIG. 3 to automatically categorize financial transactions. Chancey et al. discloses translation of a numeric code, such as a Standard Industry Code (SIC), contained within a financial statement into a financial category for the transaction. The SIC code for restaurants, for instance, is 5812. As can be determined by a review of the three actual financial transaction descriptions listed in FIG. 3 for transactions in restaurants, namely, PANCAKE CAFÉ, PIZZERIA UNO #766, and CALIFORINIA CAFÉ #17, none of these descriptions contain the numeric string xe2x80x9c5812xe2x80x9d, the SIC code for restaurants. Further, none of these descriptions contain any discernible numeric pattern in common with each other that is specific to only these restaurant-related entries in FIG. 3.
Accordingly, there is a need for improved techniques of automatically assigning a financial category based upon an electronic representation of a financial transaction. Of course, such a technique should execute as quickly as possible for several reasons. For instance, while a user of financial transaction tracking software is entering transactions, if the technique results in a perceptible delay, users will tend to be annoyed and dissatisfied. In addition, run-time efficiency is extremely important for a financial institution, which may have a very large number of transactions to automatically categorize for any given time period. In addition, for financial transaction tracking software shipped on a CD-ROM or delivered in some other manner in which the size of any data files used for automatically categorizing financial transactions is a concern, techniques that minimize the size of any necessary data files are desirable.
Conventionally, data has not been available that would allow automatic categorization of substantially all financial transactions without producing unduly long run-time delays and without requiring unduly large files of data.
The present invention provides techniques for automatically categorizing a financial transaction by examining the alphanumeric characters describing the transaction and assigning a financial category to the transaction based upon a mapping of the useful characters from the transaction description to a financial category. Automatically assigning categories to transactions eliminates the need for the extremely time-intensive process of manually categorizing such transactions so that financial transaction tracking software can perform analysis and generate reports that take advantage of having a user""s transactions categorized.
According to one variation of the invention, the description of the financial transaction is parsed to identify one or more useful strings of characters. A data file of business names is then searched for a match with the parsed string or strings from the transaction description. The data file is preferably optimized to minimize data redundancy, which minimizes both lookup times and the size of the data file. In this regard, a serialized trie may be used to represent business-name-to-financial-category mappings. Further optimization may be achieved by accessing the data file via a memory-mapped file, by compressing strings of nodes having children but no siblings into dangling nodes, and by including a table of shared suffixes.
If a match is found in the business name data file, then the transaction is categorized according to the corresponding business-name-to-financial-category mapping. Otherwise, a search of a keyword database may be performed. If a financial description keyword match is found, then a category may be assigned to the transaction based upon a keyword-to-category mapping corresponding to the matching keyword. Various strategies may be used for resolving situations in which more than one keyword match is found. For instance, keywords may be assigned relative priorities in advance, or the relative placement of keywords in the transaction description may determine which keyword match is used for assigning a category to the transaction.
Other features and advantages of the invention will become apparent through the following description, the figures, and the appended claims.