1. Field of the Invention
The present invention relates to the field of financial auditing. In financial auditing, one of the first steps is for the auditor to perform analytical review, which is defined as ‘an auditing process that tests relationships among accounts and identifies material changes; it involves analyzing significant trends for unusual change and questionable items’. Analytical review is also a powerful auditing technique in that it is often the quickest way to find anomalies. The present invention is a method for unsupervised analytical review which could be used to detect, among other things, fraudulent financial activity. The fact that the method is ‘unsupervised’ means that it learns and detects patterns and anomalies directly from whatever accounting data is being analyzed, rather than having to rely on extrinsic experience from training data, or insights or a priori knowledge on the part of the auditor. In this, it is distinguished from the current state of the art in auditing. An unsupervised approach can be used regardless of the content, data format, size, or level of materiality of the accounting dataset. It uses principles from information theory and data mining automatically to find patterns and anomalies, and to compute empirical similarities between transactions. An auditor is thus enabled to make sense of large datasets quickly and without time-consuming analysis which must be customized to each new problem or dataset.
2. Description of the Prior Art
U.S. Pat. No. 7,587,348 describes a system and method of detecting mortgage related fraud. The method scores transactions based on a model created from historical mortgage transaction data. The score is used to indicate the likelihood of fraud.
Because the system and method is based on creating a model from historical data, it is an example of a supervised learning technique. A technique of this sort will work only when historical data is available, and when the historical data is labeled based on prior experience of which transactions were fraudulent. Three disadvantages of supervised techniques are: (1) building up data of this type is time-consuming and costly; (2) the labeled data is only useful (a) to the extent the historical data and the current data relate to the same problem (e.g. mortgage fraud detection), (b) to the extent the historical and current data are structured similarly, and (c) to the extent the historical data is even available, which as a rule it is not in audit situations. The current method, in contrast, is unsupervised, which means it applies more generally and without the above-mentioned restrictions of supervised techniques.
U.S. Patent Application 2013/0046786 discloses a system for explanation-based auditing of medical records data. This invention uses a template database containing a plurality of explanation templates, and uses this automatically to identify an explanation for user access to stored information.
Because the system relies on the existence of a template database, it is again an example of a supervised technique. Only those explanations that exist in the database can be assigned to potential anomalies. If the list of explanations is not relevant to a new dataset, then the system cannot be used. The focus of the current method is instead to tease out patterns and associated explanations directly from the data, needing no a priori data or lists.
U.S. Patent Applications 2005/0222928 and 2005/0222929 disclose systems and methods for investigation of financial reporting information. These systems and methods include analyzing financial data statistically and modeling it over time, comparing actual data values with predicted data values to identify anomalies in the financial data. The anomalous financial data is then analyzed using clustering algorithms to identify common characteristics of the various transactions underlying the anomalies. The common characteristics are then compared with characteristics derived from data known to derive from fraudulent activity, and the common characteristics are reported, along with a weight or probability that the anomaly associated with the common characteristic is an identification of risks of material misstatement due to fraud.
Because this method compares data with ‘data known to derive from fraudulent activity’, it is another example of a supervised learning technique. For this reason it suffers from the same drawbacks listed in paragraph [0004] above compared to the current method which is fully unsupervised.
U.S. Patent Application 2013/0031633 discloses a system and methods for adaptive model generation for detecting intrusion in computer systems. The system and methods are compatible with unsupervised anomaly detection techniques; however, the field of the invention is the detection of anomalies in a computer system rather than financial data.
U.S. Patent Application 2013/0031633 does not teach how to apply its system and methods to the field of financial auditing. A critical component of the latter is consideration of financial materiality. In auditing, materiality is a concept relating to the importance or significance of an amount, transaction, or discrepancy. Broadly, the greater the monetary value of an amount, transaction or discrepancy (taken within the context of the business as a whole), the more likely it is to be material. The current method integrates the concept of materiality along with other unsupervised techniques in a scale-invariant fashion. Critically, the scale-invariant approach means that the method remains fully unsupervised. In fact, an intrinsic virtue of the current method is that it highlights the patterns and anomalies which are most material in a financial sense to the data that is the target of analysis, without the user needing to provide any extrinsic knowledge or experience as prior input.
U.S. Patent Application 2012/0259753 discloses a system and method for managing collaborative financial fraud detection logic. Detection logic is uploaded by users to a network and can be shared to detect risks which may be related to financial transactions.
Because this system and method relies on the logic being uploaded to the network by users, it is an example of a system which relies on extrinsic knowledge or experience to detect fraud or other anomalies in data. The current system, because it detects patterns directly from data, is fully unsupervised and therefore more generally applicable.
U.S. Patent Application 2013/0054603 discloses a method and apparatus for classifying known specimens and media using spectral properties and identifying unknown specimens and media. Detection of patterns of activity of fraud is mentioned as one area where this method and apparatus may be applicable. The method and apparatus require a group of reference specimens.
The requirement for a ‘group of reference specimens’ again makes this an example of a supervised learning technique, regardless of the applicability or otherwise to financial transactions. The current method does not require any extrinsic data and is therefore more generally useful.
U.S. Patent Application 2008/0249820 discloses an approach for assessing inconsistency in the activity of an entity, as a way of detecting fraud and abuse, using service-code information available on each transaction and applying an unsupervised data mining technique, dimensionality reduction as used in text analysis, to find inconsistencies.
This method, like the current method, uses dimensionality reduction to find anomalies. However, the approach in U.S. Patent Application 2008/0249820 is specific to fraud detection, while the current method teaches how to use dimensionality reduction to fulfill a much broader set of audit purposes, including detecting not only anomalies (which might or might not be related to fraud), but also trends and material patterns. Further, regarding the detection of fraud or anomalies, U.S. Patent Application 2008/0249820 teaches how to look at correlations and consistency between (medical) providers and patients. This assumes a requirement for data from multiple providers, while the current method requires only data from a single entity or business, which means that it could equally well be applied to a business in any industry. In the current method, anomalies can be found even when there are no peer-group businesses available for comparison.
U.S. Patent Application 2009/0234899 discloses systems and methods for dynamic anomaly detection. The invention relates to the process of detecting anomalies in heterogeneous, multivariate data sets that vary as functions of one or more independent variables.
U.S. Patent Application 2009/0234899 does not teach how to apply the systems and methods to financial data, and as mentioned in paragraph [0009] above, specifically does not deal with the problem of incorporating financial materiality. Furthermore, the systems and methods discover anomalies in relation to independent variables, while the current method discovers anomalies intrinsic to the target data. Finally, U.S. Patent Application 2009/0234899 teaches systems and methods only for anomaly detection, while the current method is a more general method allowing data exploration, not just the identification of anomalies.