1. Field of the Invention
The present invention relates broadly to management of medical information. More specifically, the present invention relates to management of medical information to perform and report measurements of physician efficiency.
2. The Prior Art
Recent evidence has suggested that about 10-20% of physicians, across specialty types, practice inefficiently. Efficient means using the appropriate amount of medical resources to treat a medical condition and achieve a desired health outcome. Thus, efficiency is a function of unit price, volume of service, intensity of service, and quality of service. This group of inefficient physicians is responsible for driving 10% to 20% of the unnecessary, excess medical expenditures incurred by employers and other healthcare purchasers, equating to billions of dollars nationally.
To improve market efficiency, it is useful to apply a system that accurately measures individual physician efficiency. Recent evidence has demonstrated that leading physician efficiency measurement systems have only about 15-30% agreement across measurement systems. This means that when one system ranks a physician as inefficient, only about 15-30% of the other systems ranked the same physician as inefficient. The remaining 70% (or more) of systems ranked the same physician as efficient.
These findings show that existing systems have significant error in attempting to accurately identify inefficient physicians. The error needs to be eliminated, or significantly reduced, if healthcare purchasers are to accurately identify inefficient physicians and take action (e.g., attempt to change physician behavior, provide incentives for employees to use more efficient physicians). Every physician falsely measured as efficient (or inefficient) leads to continued inefficiency in the healthcare marketplace.
There are ten common physician (or physician group) efficiency measurement errors present in most existing physician efficiency measurement systems, which are in order of importance: (1) examine all episodes of care for a physician; (2) use a physician's actual episode composition; (3) no severity-of-illness measure by medical condition; (4) no identification of different episode treatment stages; (5) no age category assignment by medical condition; (6) no tracking mechanism for related complication episodes; (7) improper episode outlier criteria; (8) under-report charges attributed to partial episodes; (8) over-report charges attributed to episode endpoints; and (10) no minimum number of episodes of care. These errors are discussed next.
Many physician efficiency methodologies continue to examine “services per 1,000 members” or “all episodes of care” tracked to a physician. These approaches probably add the most to efficiency measurement error. The methodologies attempt to adjust services per 1,000 members and to adjust all episodes of care by age and gender—and then compare one physician's utilization pattern to a peer group average. However, age and gender explain less than 5% of the variance in a patient's medical expenditures. This means that over 95% of the variance is unexplained, and may be attributed to differences in patient health status.
Some methodologies adjust services per 1,000 members and adjust all episodes of care based on specific International Clinical Modification of Diseases ninth edition (ICD.9) code algorithms that measure expected resource intensity. The idea is that a patient's diagnosis codes will provide more predictive power than age and gender alone. The most predictive of the published and marketed models explain only 20% to 30% of the variance in a patient's medical expenditures. This means that 70% or more of the variance continues to be unexplained, and may be attributed to differences in patient health status.
Physicians often criticize the services per 1,000 members and the all episodes of care methodologies that use a predictive case-mix adjustment factor. Physicians state that the methodologies do not appropriately adjust for differences in patient health status—rightly stating that their patients may be “sicker.”
If all claim line items (CLIs) or episodes of care tracked to a physician are used in the efficiency analysis, then up to 70% of the observed utilization difference between physicians may be attributed to patient health status differences. Therefore, patient health status differences are measured rather than individual physician efficiency differences. This weakness in current case-mix adjustment tools means that not all CLIs or patient episodes of care treated by a physician can be examined. Instead, an isolated set of more prevalent medical conditions by severity-of-illness level needs to be examined across physicians of a similar specialty type.
The second measurement error, which occurs in most if not all current efficiency measurement systems, occurs when the physician's actual episode composition is used. The reason is as follows. The differences in physicians' patient case-mix composition results in differences in variability (i.e., the standard deviation) around a physician's average episode treatment charges. This variability is not due to the efficiency or inefficiency of a physician, but instead results because longer and more resource-intensive medical conditions generally require more services and, therefore, have more potential variability around average (or mean) episode treatment charges.
For example, easier-to-treat upper respiratory infection (URI) episodes may have the following mean and standard deviation (with outlier episodes removed): $185±$65. Here, the standard deviation around the mean is not large—and is 0.35 the size of the mean (i.e., 65/185=0.35). However, easier-to-treat pediatric asthma episodes may have the following mean and standard deviation (with outlier episodes removed): $1,650±$850. Here, the standard deviation around the mean is larger than for URI episodes—and is 0.52 the size of the mean (i.e., 850/1,650=0.52).
The variation difference between the two conditions is 49% greater for asthma than URIs [(0.52−0.35)/0.35]. This variation difference occurs for two reasons: (1) more resource-intensive conditions require more services to treat; and (2) there generally are a small number of episodes available to examine in a given physician efficiency study as compared to the universe of episodes that could actually be studied—and a smaller number of episodes results in a higher chance for variability around the mean. This variation is not the result of physician treatment pattern differences.
If the statistically based variability around the mean is not corrected, then substantial error may enter into the physician efficiency measurement equation. Consequently, the final physician efficiency score differences may be attributed to the statistical condition-specific variability around the mean episode charge (due to the case-mix of episodes treated).
The above example showed that the variation difference may be 50% or more (around a condition-specific mean episode value). Logically, then, if all episodes treated by physicians are examined and efficiency scores are calculated, there has to be some statistical bias present.
A significant statistical bias may be present. Using a more traditional episode-based efficiency measurement methodology, lower-episode-volume physicians treating patients with a higher case-mix index score are more likely to receive an inefficient ranking as compared to lower-episode-volume physicians treating patients with a lower case-mix index score. This finding results because a physician with higher case-mix patients treats episodes having more variability (i.e., a greater standard deviation) around average episode treatment charges. With a low volume of episodes (most often the norm, and not the exception), this physician needs only a few higher-cost episodes then the peer group average to make his/her treatment pattern appear significantly higher than the peer group comparator.
However, a physician with lower case-mix patients treats episodes having less variability around average episode treatment charges. With a low volume of episodes, this physician's treatment pattern will not be as influenced by one or two higher-cost episodes as compared to the peer group average. Consequently, his/her treatment pattern does not appear (as often) significantly higher than the peer group comparator.
Thus, by examining all medical condition episodes, a substantial component of any observed physician efficiency difference may be attributed to statistical condition-specific variability around the mean episode charge—and not to physician treatment patterns efficiency. This effect may be present even when we examine the easier-to-treat episodes (SOI-1 level episodes) for the medical conditions.
The third error takes place in those efficiency measurement systems that do not employ an appropriate episode severity-of-illness measure. Severity-of-illness may be defined as the probability of loss of function due to a specific medical condition. Most, if not all, current claims-based episode groupers and methods do not have an appropriate severity-of-illness index by medical condition. Consequently, significant clinical heterogeneity remains in many episodes for a given medical condition. The end result may be physician efficiency differences that are attributed to inaccurate episode severity-of-illness adjustment, and not to physician treatment patterns variation.
Moreover, some claims-based episode groupers stratify formulated episodes for a medical condition by the presence or absence of a specific surgery or service (e.g., knee derangement with and without surgery; ischemic heart disease with and without heart catheterization). The reason for performing this stratification is to reduce episode heterogeneity for a medical condition. In effect, the stratification serves as a sort of severity-of-illness adjustment.
However, stratification based on the presence of surgery or a high-cost service results in at least two physician efficiency measurement errors: (1) performing surgery versus not performing surgery is the treatment patterns variation we need to examine in determining physician efficiency, and this variation is not captured in more traditional methodologies; and (2) the episodes of care are unnecessarily separated into smaller groups whereby physicians may not have enough episodes to examine in any one smaller group. Consequently, the stratified episodes of care need to be recombined for accurate physician efficiency measurement.
The fourth physician efficiency measurement error occurs in claims-based episode groupers do not have a method for identifying different episode treatment stages including initial, active, and follow-up treatment stages. Identifying different treatment stages is important in medical conditions, such as breast cancer, prostate cancer, colorectal cancer, acute myocardial infarction, and lymphoma. For example, breast cancer should be stratified into initial, active, and follow-up treatment stages.
An initial breast cancer episode is one where the patient has a surgery for the cancer (e.g., lumpectomy, modified radial mastectomy). An active breast cancer episode is one where no surgery is present, but chemotherapy or radiation treatment is observed within the episode. Here, the patient underwent surgery in a previous study period, so no surgical event is found in the patient's current ongoing breast cancer episode. Instead, during the study period, the claims data shows that the patient is being treated with chemotherapy and/or radiation. The presence of these treatments defines an active breast cancer episode. The utilization pattern and charges are different for an active breast cancer patient as compared to an initial breast cancer patient. A follow-up breast cancer episode is one where no surgery, chemotherapy, or radiation treatment is present in the patient's episode of care. After initial and active treatments, physicians will continue to code for breast cancer over the future years of patient follow-up care.
In a given study period, physicians do not treat an equal distribution of each episode type (initial, active, and follow-up). Moreover, the episode types have different average charges. About 20% of episodes may be classified as initial breast cancer episodes. Overall care for initial breast cancer episodes ranges between $15,000 and $25,000 per episode. About 15% of episodes may be classified as active breast cancer episodes. Overall care for active breast cancer episodes ranges between $12,000 and $18,000 per episode. About 65% of episodes may be classified as follow-up breast cancer episodes. Overall care for follow-up breast cancer episodes ranges between $350 and $600 per episode.
Consequently, the blending of the three treatment stage episodes results in average treatment charges of about $5,500 to $6,500 per episode. In fact, this is the average breast cancer charge that would be observed for most claims-based episode groupers.
The blending of initial, active, and follow-up episodes may lead to substantial physician efficiency measurement error. For example, assume during a study period that Oncologist A treats mostly active breast cancer patients, while some other oncologists have a good mixture of active and follow-up patients. Then, Oncologist A's treatment pattern for breast cancer will appear inefficient (as compared to his peer group of oncologists) because active episodes are about 30 times more expensive to treat than follow-up episodes. In fact, Oncologist A's treatment pattern difference is attributed to a different treatment stage episode case-mix.
Therefore, treatment stage episode types need to be correctly identified and separately examined. Otherwise, the final physician efficiency score differences may be attributed to nothing more than the initial, active, and follow-up episode case-mix.
The fifth error happens in those physician efficiency measurement systems that do not examine condition-specific episodes by age category. Studies have illustrated that broad-based age bands are important to separately examine—even after episodes have been assigned a severity-of-illness index. The reason is that physicians tend to treat children and adults differently for most conditions. For example, children are less likely than adults to receive a chest x-ray and potent antibiotics for many medical conditions. If episodes are not examined by broad-based age category, the end result may be physician efficiency differences that are attributed to patient age differences—and not to treatment patterns variation.
The sixth error occurs in those physician efficiency measurement systems that do not link and include the charges and utilization from a patient's complication episodes to his underlying medical condition. Complications are those episodes that are clinically related to the underlying medical condition. Consequently, many condition-specific episodes have under-reported charges. In fact, actual outputs from some claims-based episode groupers may show under-reported charges for patients with diabetes and other chronic conditions (e.g., asthma, congestive heart failure).
For example, the reason for the under-reported episode charges is that physicians code up to 70% of an average diabetic's charges under related complications to the diabetes (e.g., eye, neuropathies, circulatory, renal) and not diabetes care. Therefore, without considering and including related complication episodes with the actual diabetes episode, physician efficiency differences may be attributed to incomplete episode charges and utilization—and not to treatment pattern variations.
Furthermore, for patients with specific medical conditions, any model that attempts to stratify patients by health risk may produce unstable or erroneous results. The reason is that a patient is missing key claims information needed to accurately classify a patient into an appropriate severity-of-illness and other classes. For example, without tracking related complications to a diabetic patient, many diabetic patients will appear to have no complications when in fact they have eye or circulatory complications.
The seventh physician efficiency measurement error happens when the condition-specific outlier episode analysis is not performed in an appropriate manner. Many current methodologies perform the high-end outlier analysis by eliminating a percent of condition-specific episodes at the peer group (or aggregate episode) level. That is, the methodologies eliminate the high-end outliers before assigning episodes to physicians.
However, this method results in physician efficiency measurement error because a higher proportion of episodes assigned to the most inefficient physicians will be eliminated (as compared to the proportion of episodes eliminated for efficient physicians). Consequently, the inefficient physicians' condition-specific treatment patterns now more closely resemble the treatment patterns of the efficient physicians.
An example demonstrates this error. Assume Physician A has seven episodes of acute bronchitis with the following per episode charges: $235, $245, $325 $400, $525, $550, and $600. Also, the outlier cut-off threshold for high-end outlier episodes is set at $399 at the peer group level. Physician A now has only three episodes remaining at $235, $245, $325. The mean charge is $268 per episode. Assume Physician B also has 7 episodes of acute bronchitis with the following per episode charges: $210, $225, $235, $255, $285, $320, and $390. The peer-group level outlier threshold remains at $399. Therefore, Physician B has all seven episodes remaining, and the mean charge is $274.
The end result shows no statistical difference between Physicians A and B. The mean episode charge of Physician A is slightly lower than Physician B (i.e., $268 versus $274 per episode). However, using an outlier rule where we eliminate 5% of episodes (or at least 1 high-end outlier) are eliminated at the physician level, the results are significantly different. Physician A now has six remaining episodes (i.e., here we eliminate only 1 high-end outlier), and the mean charge of the six non-outlier episodes is now $380 per episode. For Physician B, the mean charge for the six non-outlier episodes is now $255 per episode. Physician A is statistically higher in average (or mean) episode charges than Physician B by $125 per episode.
The eighth error occurs in those systems that under-report charges attributed to partial (or incomplete) episodes of care. Some methodologies do not separate partial from complete episodes of care when measuring physician efficiency. Partial episodes result because a patient enrolled in a health plan during the study period or disenrolled during the study period. However, including partial episodes leads to inaccurate efficiency measurement because of under-reported episode charges—especially when some physicians have more partial episodes than other physicians.
A reason partial episodes often slip through the cracks and into an efficiency analysis is because the methodologies do not use a membership eligibility file to ensure the member is present for the entire study period. Instead, the methods assume that a condition-specific episode of care is complete if the episode exceeds some minimum duration time period. For example, if a patient's episode of diabetes is 40 days or more in duration, then the episode is marked as complete—and not partial. If a patient's diabetes episode is 39 days or less, then the episode is marked as partial.
Applying an indiscriminate time period duration to condition-specific episodes produces a high percentage of episodes marked as complete, which are actually partial (or incomplete) episodes. That is, many health plan's have membership turnover rates of 20% or higher. Consequently, a diabetes episode of 40 days duration—marked as complete—has at least a 20% chance of being a partial episode of care because of membership turnover. The end result may be physician efficiency differences that are attributed to the inclusion of partial episodes—and not to treatment patterns variation.
The ninth error happens in physician efficiency measurement systems that over-report charges attributed to episode endpoints. Some methodologies do not appropriately end a patient's episode of care before measuring a physician's efficiency. For example, chronic conditions may continue indefinitely and, therefore, patient episodes of care may be of various durations (e.g., 60 days or 600 days)—depending on the amount of available patient claims data. The end result may be physician efficiency differences that are attributed to excessively long or variable chronic condition episode durations—and not to treatment patterns variation.
The tenth error takes place in those systems that impose few requirements for having a minimum number of episodes in a certain number of medical conditions. Many methodologies do not require a minimum number of condition-specific episodes when comparing a physician's efficiency to a peer group. Instead, only a small handful (e.g., less than 10 episodes) are enough. However, there may be significant episode of care heterogeneity in one or two condition-specific episodes—even after applying a sophisticated severity-of-illness index. Consequently, examining an episode here-and-there for a physician may introduce significant error into a physician's efficiency measurement. The end result may be physician efficiency differences that are attributed to the heterogeneity in the low number of episodes examined—and not to treatment patterns variation.
Various systems have been patented in the episode of care field. Such systems are shown, for example, in U.S. Pat. Nos. 5,557,514, 5,835,897 and 5,970,463. However, none of these systems adequately overcome the aforementioned problems with respect to appropriately building and analyzing episodes of care. As importantly, existing systems fail to discuss an episode-of-care-based system for measuring individual or physician group efficiency measurement.