The sources of operational problems in business transactions often show themselves in relatively small pockets of data, which are called trouble hot spots. Identifying these hot spots from internal company transaction data is generally a fundamental step in the problem's resolution, but this analysis process is greatly complicated by huge numbers of transactions and large numbers of transaction variables that must be analyzed.
Basically, a “hot spot” among a set of comparable observations is one generated by a different statistical model than the others. That is, most of the observations come from one default statistical model, while a relative few come from one or more different models. In its classic usage, a “hot spot” is a geographic location where an environmental condition, such as background radiation levels or disease incidence, is higher than surrounding locations. The central idea of a “hot spot” is its statistical comparison with a reference entity: there must be a set of observations from a default model from which the “hot spots” distinguish themselves.
There are many diverse applications for this concept: (1) a network, such as the DSL or fast packet network, includes many pieces of equipment (e.g., DSLAMs or frame relay switches) and some of these tend to be more trouble-prone than the rest; (2) a business process, such as the testing of circuit troubles, may have geographic locations, times of day, or specific employees which are associated with higher diagnosis or repair times than the rest; (3) an employee function, such as trouble ticket handling, may have some technicians who are more or less productive than their peers; and (4) certain transactions, such as customer care calls or trouble repairs, may be more time-consuming than others, for example.
There are common threads in these applications. First, as stated above, there are moderately to very large sets of “normal” entities (e.g., switches, line testings, employees) to estimate the parameters of the background model from which the “hot spots” distinguish themselves. Second, the distinguishing is structural, and not due to chance alone. That is, the equipment/testing procedure/employee handling is abnormal or bad, and not just a function of natural variations in the environment. Third, the pattern of “hot spots” persists over time, so that corrective measures have time to be formulated and implemented. Specific equipment/testing outcomes/employee productivity must remain as an issue long enough for corrective actions to make an improvement. Finally, the “hot spots” must be distinctive enough, or in a sufficiently crucial business so their repair makes a substantial contribution to the company's performance.
Once a “hot spot” is found, a fundamental part of its diagnosis lies in its comparison with entities that are not “hot.” One might compare variables from a troublesome geographic location with those from other locations, or a troublesome serving hour with other times. There are at least two kinds of entities whose variables might be usefully compared with the “hot spot”: 1) entities which are like the “hot spot” in geography, organization, technology, etc., or 2) entities which are high performers that may serve as “best in class” benchmarks or which have performance levels to which the “hot spot” may aspire. For conciseness, call this entity the “Reference” (R), to which the “Hot Spot” (H) will be compared. The goal is to understand “H-R” differences, i.e. how the Reference and the Hot Spot differ in terms of their descriptive variables X.
Usually, in business applications, the Reference is known (e.g., a well-performing business unit). The Hot Spot is also typically identified, either from prior knowledge, or by discovery through standard data mining techniques as mentioned above. The problem is that there may be many variables X to compare across the two entities, and many conditions under which to compare them. For instance, telephone repair times can be split into many constituent parts, and those parts should be compared for such conditions as time of day, day of week, type of trouble, etc. Data mining tools are designed to efficiently analyze large data sets with many variables, but they are typically designed to expose relationships within a single entity, not across two or more entities, such as two or more locations, as required to understand the H-R differences.
Instead, subjective opinion or informed guessing would be used to suggest ways in which the two (or more) locations might be expected to differ. Unfortunately, subjective opinion and informed guessing are not always accurate and may not take into account the myriad of potential comparison variables that may be necessary to consider in order to accurately identify hotspots.