Forensic data analysts, auditors, accountants, and financial analysts often use the predictability of digit occurrence in recorded amounts relating to accounting, financial, and investments reports as a tool to detect suspicious data.
One well-known existing method for detecting fraud and anomalies in a set of data is based on Benford's Law, which will be at times referred to by its abbreviation as BL. Frank Benford in “The law of anomalous numbers,” Proceedings of the American Philosophical Society (1938), noted a peculiar proportion of digits in everyday data.
Benford's Law describes the overall proportion and the specific manner in which digits are expected to occur in a variety of real-life pieces of data. Leading Digits, abbreviated LD, or first significant digits, are the first digits of numbers appearing on the left. For 567.34 the leading digit is 5. For 0.0367 the leading digit is 3, as we discard the zeros. For the lone integer 6 the leading digit is 6. For negative numbers we simply discard the sign, hence for −34.68 the leading digit is 3. According to Benford's Law, the first leading digit is heavily skewed in favor of the low digits, where digit one (“1”) occurs in about one-third of all recorded numbers, while the digit nine (“9”) occurs less than one time in twenty. More specifically, for the first digit, the proportion of occurrences among all nine digits (beginning with digit 1) is: {30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1%, 4.6%}. See FIG. 1 for Benford's Law table of 1st digits distribution, as well as FIG. 2 for the chart.
The exact mathematical expression is Probability[1st digit is d]=LOG10 (1+1/d), and this proportion among the first digits is also known in the literature as the logarithmic distribution or simply Benford. The validity of Benford's Law has been observed and verified in numerous domains. The formal mathematical explanation for this digital phenomenon has been demonstrated by Ted Hill in his seminal work “A statistical derivation of the significant-digit law,” Statistical Science, 1995, showing that a collection or a mixture of a variety of distributions defined on the positive range is Benford in the limit. While the law is true for naturally occurring typical numbers related to financial, accounting, census, sports, and numerous others data, yet for maliciously invented fake data the law is obviously not obeyed and instead digits appear all equally likely (uniformly distributed), just as most people would mistakenly intuit. Mark Nigrini in the early 1990s first suggested applying this digital property as a technique in forensic data analysis of accounting and financial data to detect fraud, and further analysis is given in Nigrini, M J (1992). “The Detection of Income Tax Evasion Through an Analysis of Digital Frequencies”. PhD thesis, University of Cincinnati, Ohio.
Following Nigrini's innovation soon afterwards, it has been increasing used by accounting firms, governmental tax authorities such as the IRS in the USA, and in most other tax authorities worldwide, as routine check on data. The logarithmic distribution is so ubiquitous that it is hard to overestimate its importance and relevance in forensic data analysis and other disciplines, and it is certainly found in almost all financial and accounting types of data. The term Leading Digits here would refer to the more general study of digital patterns for any piece of data and distribution whether they obey Benford's law or not. Clearly, there exist other digital patterns (mini laws) of lesser importance for some very particular pieces of data and distribution outside the scope of BL. It is important to recognize the fact that each well-defined piece of data or a distribution has its own particular leading digits signature, a sort of a hidden digital code—not immediately obvious during the first visual (preliminary) inspection of it when the focus is on numbers and quantities as oppose to their digital expressions.
Fraud and anomalies can be detected using Benford's law by comparing the actual distribution of the first digits in a set of accounting or financial data to the theoretical distribution given by Benford's law. A cautionary flag is raised if deviation of actual from theoretical is significant, which calls for further scrutiny and examination of data. The law also describes an exact distribution for the second order digits, where proportions among digits are more equal, culminating in near equality for the 5th and higher orders. For example, the 2nd leading digit (from the left) of 603 is digit 0, of 0.0002867 it's digit 8, and of 1,653,832 it's digit 6. It is noted that for the 2nd and higher orders, digit 0 is also included. The exact 2nd order distribution for all 10 digits according to Benford's Law is: {12.0%, 11.4%, 10.9%, 10.4%, 10.0%, 9.7%, 9.3%, 9.0%, 8.8%, 8.5%}. See FIG. 3 for a chart showing this more even distribution. Digital proportion for the 2nd order is not nearly as skewed in favor of low digits as is the case for the 1st order, hence even though further digital tests involving higher orders can also be performed, their power to detect fraud is much reduced as the random often overwhelms the systematic.
Yet, two serious pitfalls arise in this context. The first is whenever data itself is not inherently Benford to begin with and thus can not be so tested. Examples of such non-Benford data are Payroll amounts, amounts with built-in minimums or maximums, amounts with human-made restrictions or specific intentions, as well as others types of data. The second pitfall is whenever fake data is invented and provided by the sophisticated and well-educated cheater already aware of Benford's Law and all its higher orders features as well.
The latter difficulty is a factor that will become increasingly more problematic, and will represent a serious challenge to the forensic data analysis applying Benford's Law as more accountants and executives become aware of this digital phenomena and will be tempted to calibrate digits in invented fake data according to the law so as to make it appear genuine.
This invention is aimed at providing a satisfactory answer to both of these two pitfalls by providing the auditor and forensic data analyst an exact computerized venue for an alternative test by employing the invented techniques and algorithms relating to a digital pattern existing within the data selected for examination, a pattern more prevalent the Benford's Law and covering types of data hitherto not under the scope of the examining statistician/programmer. This newly discovered pattern showing more intricate and refined digital order within numbers and applied via the techniques and computer implementations presented herein would frustrate the more sophisticated and educated cheater carefully adjusting invented data to Benford's Law.