I. Field of the Invention
The present invention relates generally to a data analysis computer program and, more particularly, to a data analysis program for analyzing sets of temporal data such as temporal health care surveillance data, and especially epidemiological data related to antimicrobial resistance and nosocomial infections.
II. Description of the Prior Art
There are many health care databases, e.g. epidemiology databases and hospital databases containing temporal data, i.e. data which is collected over time and can be grouped by time intervals. Such databases, furthermore, typically include bacterial antimicrobial resistance/susceptibility data, and related patient clinical and demographic data at hospital, regional, and national levels. Domain experts in epidemiology and laboratory medicine currently review this data by performing manual analysis in an effort to discover significant new patterns, information and trends of the data, especially those related to outbreaks of nosocomial and community-acquired infections, and significant changes in antimicrobial resistance of microorganisms. Such manual analysis includes database queries and confirmatory statistics to specific questions in an effort to test specific hypotheses. These traditional methods of data analysis, however, offer no way to discover patterns and trends that are not suspected by the investigators of the data. Consequently, such unsuspected trends and patterns are often ignored and remain undiscovered even though such trends and patterns may be significant. The inefficiency of this process results in the non-discovery or late discovery of patterns and trends that result in increased patient morbidity and mortality and increased cost of medical treatment due to preventable adverse outcomes.
The present invention provides a method for analyzing sets of temporal data, especially epidemiological data and hospital data, to automatically identify significant trends and patterns in the data in a timely fashion.
In brief, the method of the present invention analyzes sets of temporal data wherein each set of temporal data comprises a plurality of records collected during a time period unique to each such set. Each record has a plurality of data items including, for example, patient characteristics, specimen source, date obtained, test performed, results obtained, organisms isolated, location of patient in the healthcare facility, date the patient was admitted to the facility, patient clinical data, and one or more antimicrobials used to test the isolated organism against.
The method of the present invention includes the first step of creating data association rules for at least a plurality of sequential data sets, i.e. sequential temporal data sets, wherein each such data set includes at least some common data items. Each data association rule is further compared to user defined rule templates to differentiate between interesting or significant rules and uninteresting or insignificant rules. The templates may be either xe2x80x9cincludexe2x80x9d or xe2x80x9cexcludexe2x80x9d templates, as described below.
The incidence proportion of each association rule is computed for the current data set analyzed as well as for previous data sets, i.e. those data sets that contain data from earlier times. The incidence proportion of an association rule Axe2x86x92B in data set pi is the number of times outcome B occurs in group A in time ti. Consequently, a series of incidence proportions for Axe2x86x92B from data sets p1, p2, . . . , pn describes the incidence of the outcome B in group A from t1 through tn. Therefore, by analyzing the time-series of incidence proportions of an association rule Axe2x86x92B from a temporal set of data sets, it is possible to detect shifts or trends in the incidence of B in A over time.
An event describes an interesting change in the temporal series of incidence proportions of an association rule. In order to determine such events, the temporal set of incidence proportions of an association rule are divided into user-defined windows, namely the past window, wp, and the current window, wc, and a cumulative incidence proportion is computed for each window. wp and wc are defined in the windowing schedule. Each entry in the windowing schedule is defined by the user and comprises a window pair wpp, wpc where wpp describes the data sets in the past window wp and wpc describes the data sets in the current window wc. For each association rule, each window wp and wc, furthermore, contains at least one set of temporal data.
The cumulative incidence proportion is computed for a time window (wp or wc) for an association rule by summing one or more incidence proportions in the time window. The cumulative incidence proportion of an association rule r=Axe2x86x92B in a time window w is       cip    ⁡          (              r        ,        w            )        =                    ∑                  pi          ∈          w                    ⁢              sup        ⁡                  (                                    A              ⋃              B                        ,                          p              i                                )                                    ∑                  pi          ∈          w                    ⁢              sup        ⁡                  (                      A            ,                          p              i                                )                    
Simply stated, the numerator of the cumulative incidence proportion is the sum of the numerators of all incidence proportions in w, and the denominator is the sum of the denominators of all incidence proportions in w.
A change in the cumulative incidence proportion of a particular association rule between wp and wc, such that the probability that the change occurred by chance is less than some predefined percentage (e.g. 5%) as determined by a chi-square test of two proportions or some other applicable statistical test, generates an event. Following the generation of events, the clustering of events by event capture generates event sets, each with an alert. Alerts are presented to the operator.
In the preferred embodiment of the invention, the number of events presented to the operator is reduced by event capture.
A primary advantage of the present invention is that it rapidly identifies alerts consisting of high support association rules whose cumulative incidence proportions change significantly over time. Using traditional methods, these alerts might be overlooked. Additionally, these alerts are a selected subset of all events, thereby focusing information presentation to the operator.