Healthcare fraud is a growing problem in the United States and abroad.
According to the Centers for Medicare and Medicaid Services (CMS), fraud schemes range from those perpetrated by individuals acting alone to broad-based activities by institutions or groups of individuals, sometimes employing sophisticated telemarketing and other promotional techniques to lure consumers into serving as the unwitting tools in the schemes. Seldom do perpetrators target only one insurer or either the public or private sector exclusively. Rather, most are found to be simultaneously defrauding public sector victims such as Medicare and private sector victims simultaneously.
Annual healthcare expenditures continue to increase at rates exceeding inflation. Though the amount lost to healthcare fraud and abuse cannot be precisely quantified, the general consensus is that a significant percentage is paid to fraudulent or abusive claims. Many private insurers estimate the proportion of healthcare dollars lost to fraud to be in the range of 3-5%, which amounts to in excess of $100 billion annually. It is widely accepted that losses due to fraud and abuse are an enormous drain on both the public and private healthcare systems.
Variety of approaches have been tried to detect this fraud. Rules based systems have been deployed, which are relatively easy to build especially for new insurance providers who do not have enough historical data. But such systems can not cope with the exhaustive range of fraud and the rapid evolution in fraud techniques. A more robust approach is to use data driven analytics to capture relationship between the characteristics of claimant, service provider, pharmacy etc and the fraud patterns. Historically the insurance providers do not have reasonably large number of known fraud cases from SIU that could be used to train conventional supervised models, so unsupervised models are created to detect fraud. This approach can not only detect most of the known fraud patterns, it is flexible and scalable enough to keep up with the rapid evolution in fraud patterns.
Problems arise when an attempt is made to create fraud detection models for insurance providers with a relatively low number of claims. The data of such small or young insurance providers may lack longitudinal depth, cross-sectional breadth, or both. These conditions make it virtually impossible to create a general purpose robust fraud detection model using a customized data driven approach.
In the unsupervised modeling domain, rare combinations of events are flagged by computing statistics on these events. Traditionally, modelers have grappled with small datasets by being very conservative and flagging only extremely rare combinations of very common events. This is done by using smoothing techniques which smooth away events which are less common. Unfortunately this has an undesirable impact of failing to detect many existing fraudulent claims.
In addition, often during the efforts to identify fraudulent cases in insurance domain in general it is observed that data fragmentation is an issue. It hinders a comprehensive view of the problem and hence the model is unable to keep up with the changes in the business environment.