1. Field of the Invention
The present invention relates generally to data mining and analysis. More particularly, it concerns mining and analyzing medial claims data to, e.g., (a) assist in the identification of clinical investigators and potential trial subjects for clinical trials or determining feasibility of clinical trials, (b) assist in the identification of medical expert witnesses, medical directors, or other medical professionals, (c) assist in the investigation of medical fraud, and (d) assist in various types of marketing. Even more particularly, it concerns improving the speed of medical-related data mining and analysis of very large data sets such as administrative healthcare data through the creation and use of specialized searching tables (SSTs). It also concerns improving the speed of certain statistical calculations through the creation and use of factorial tables having logarithmic entries, making it possible to reliably work with very large numbers and data sets.
2. Description of Related Art
A wealth of information is contained in administrative healthcare claims data. For example, an administrative healthcare claims database may contain information concerning, but not limited to, patient identification, physician identification, physician history, prescription drug history, medical examination history, medical diagnosis history, medical billing history, medical cost information, health benefit information, medical procedures, etc.
Conventional techniques have been employed to mine at least some of this information. Data mining of healthcare claims data, however, involves a slow, computationally-intensive process that may return useful results only after hours or more of computation time. Lengthy search and analysis times plague the medical data mining field and discourage many from fully utilizing medial claims data for useful applications.
Administrative Healthcare Claims Data and Statistical Calculations
Healthcare organizations and many other organizations lack the ability to rapidly analyze extremely large data sets (e.g., over a billion claim lines), apply statistical analysis protocols, and aggregate output into relevant, actionable answers for a specific need.
When working with very large datasets (like administrative healthcare claims data), it is difficult and time consuming to look for patterns that are non-random. Generally speaking, the process sometimes involves comparing each record (for example in a claim) against every other record, keeping track of differences, and then analyzing the differences for patterns. As data sets get larger, there can be an explosion in the number of unique comparisons that need to be made. For example, if one has 10 million records, then adding one record may mean that there will be 10 million new comparisons that need to be made and tracked. When one has 100 million records and 1 record is added, there may be up to 100 million new comparisons to make. As such, there are entire classes of analysis that are impractical or impossible to perform on very large data sets, no matter how powerful the database engine.
Administrative Healthcare Claims Data for Clinical Trials
Clinical trials rely on voluntary participation of study subjects to evaluate new drugs, medical devices, or other interventions. Trials may also be directed to, among other things, evaluating procedures for detecting or diagnosing a particular disease or finding ways to improve the quality of life for those suffering from a chronic illness. Trials are usually conducted by researchers associated in some way with a pharmaceutical company, university, hospital, foundation, or governmental agency.
A significant challenge in carrying out any clinical trial is recruiting the appropriate number and type of volunteer study subjects. Volunteer study subjects are selected so that they meet one or more exclusion or inclusion criteria defined by a study protocol that has been approved by an ethics review board. These criteria are aimed at investigating the impact of a predefined intervention (e.g., a new drug) on a particular patient population (e.g., include only hypertensive patients and exclude those younger than 18) and thereby characterize the effect of such an intervention on this population. This stage of the clinical trial—patient recruitment—can be costly, for each extra day it takes to identify a pool of subjects may ultimately represent one fewer day a new drug is on the market (and protected by a patent or other intellectual property). For some successful drugs, the cost of delay may approach or even surpass millions of dollars per day.
Some have attempted to use administrative healthcare claims data for the recruitment of subjects for clinical trials. Services in existence today involve researchers submitting a clinical trial protocol including related inclusion and exclusion criteria to a data service. The data service accesses administrative healthcare claims data (often of limited scope) in an attempt to estimate the size of a pool of potential study subjects and estimate their location. The service, however, can take upwards of one-month for results to be returned. This time delay comes about, at least partially, due to the large amount of time necessary for the actual data mining and analysis. Because healthcare claims data can involve millions of records, the searching necessary to identify potential study subjects can be very time consuming and can, in some instances, represent a significant time delay in bringing a drug to market. Additionally, the long delay may compound itself if researchers discover that a first set of inclusion/exclusion criteria would not yield a large enough potential study subject pool. When the inclusion/exclusion criteria are modified in an attempt to encompass more participants, the researcher may be forced to wait another month or longer before knowing if the change in criteria will indeed yield an appropriate number of possible study subjects.
Administrative Healthcare Claims Data for Detecting Medical Fraud
Data mining techniques known in the art have been used in an attempt to detect abnormalities in billing practices of physicians, through analysis of underlying claims data. For example, through claims data, one can attempt to determine whether there are any abnormalities or consistent differences in billing practices that would result in higher payments being directed to the physician in question.
Conventional techniques, however, suffer from the same or similar problems discussed above—namely, lengthy analysis times. Additionally, because of the vast amount of data that may be associated with a claims database, traditional techniques have not been able to take advantage of certain statistical techniques that would provide particularly useful information concerning potential fraud. For example, statistical techniques that employ the factorials of extremely large numbers are not undertaken at least because the calculations would cause “data overflow” errors, or other errors that would slow or stop an analysis.
Administrative Healthcare Claims Data for Other Applications
Mining administrative healthcare claims data for other applications suffers similar problems concerning long computation times and delay. The problems are believed to discourage researchers and others from taking advantage of the full potential of claims data.
The referenced shortcomings of conventional methodologies mentioned above are not intended to be exhaustive, but rather are among many that tend to impair the effectiveness of previously known techniques concerning data mining and aggregated analysis of large amounts of healthcare claims data. Other noteworthy problems may also exist; however, those mentioned here are sufficient to demonstrate that the methodology appearing in the art have not been altogether satisfactory and that a significant need exists for the techniques described and claimed here.