Mathematical algorithms have been used to identify disease clusters and are a key component of syndromic surveillance software used to monitor for bioterrorism, as well as naturally occurring disease outbreaks. Examples of such algorithms are wide spread. One study of the use of algorithms of which Applicant is aware was made by DARPA. (See e.g. DARPA Bio-ALIRT Program Technical Report, “Evaluation of Algorithms for Outbreak Detection Using Clinical Data from Five U.S. Cities” (Oct. 15, 2004).
With particular relevance to this invention, a statistical method for quality improvement in industry is known as the “Fast Initial Response Cumulative Sum” (FIR CUSUM) method. This method is well known and is explained in detail, for example, in Ryan, T P. “Statistical Methods for Quality Improvement” John Wiley & Sons, New York, (1989) pp. 110-112.
The standard FIR CUSUM test statistic calculates the deviations q from the moving mean (xbar) on each day (t). The value St (q) accumulates these deviations from the mean (qt), but only if they exceeded the mean by a threshold value (k). When the accumulated sum of deviations exceed a preset limit h, a “signal” is generated, and the sum, St, is reset to a starting value, Sreset, for analyses continuing on succeeding days. All deviations (qt), thresholds (k) and limits (h) are expressed in standardized units (sigmas).S0=0St(q)=Max [0,St-1+qt−k]h=threshold value                If St(q)>h then reset to St(q)=Sreset (xt−xbar)        where qt=——————                    (std. deviation)                        
Attempts have been made to apply the standard FIR CUSUM procedure to the problem of detecting and rapidly identifying the onset and outbreak of diseases. Nonetheless, there are many limitations of this standard FIR CUSUM procedure when applied to syndromic surveillance data. By way of example only and not by limitation, some of the weaknesses of the FIR CUSUM are listed as follows:                1) The user or programmer must specify an “interval width”, period of time for calculating the moving mean and standard deviation. The user must also specify the three model parameters (h, k, and Sreset). However, the parameter values with the “best” sensitivity and specificity depend on the type, shape, amplitude and duration of the outbreak one wishes to detect. These characteristics of outbreaks are different for each disease and depend on the mode of exposure, the magnitude of exposure, the location of exposure, and numerous other variables that cannot be known in advance.        2) The known FIR CUSUM procedure weights all data in the moving average window equally, making no allowance for the natural weekly periodicity seen in many healthcare settings.        3) The mean & standard deviation of the known FIR CUSUM procedure are heavily influenced by outliers, including zero values (in practice, zeros frequently represent missing data).        4) The known FIR CUSUM procedure can not quantify the “unusualness” or public health importance of a signal or flag. An outbreak involving 10 persons ill generates the same signal as one involving 200 people.        5) The known FIR CUSUM procedure does not quantify the duration of a signal or flag. In fact, it “resets” after every out-of-control signal is generated, so consecutive days with unusual values are frequently missed.        
The FIR CUSUM method is not the only statistical method that exhibits these weaknesses when applied to real-world healthcare syndromes. Most of the available statistical methods that have been applied to outbreak detection were adapted from engineering and quality control applications, and have serious deficiencies in terms of sensitivity and specificity. Sensitivity is the proportion of true positives (true outbreaks detected). Specificity is the proportion of true negatives (false outbreaks not flagged). To maximize sensitivity and specificity, these algorithms require a long “training period” (a year or more of baseline data to “learn” what kinds of peaks comprise true positives and true negatives). This requirement for a large baseline set of data is often impractical because:                1) Systems to collect syndromic surveillance data are only now being developed.        2) There are many situations where it is impossible to obtain comparable data. For example, the Olympic Games cause a large influx of people into a small geographic area for a limited period of time, so no comparable “baseline data” exist.        3) These algorithms are tuned to detect outbreaks similar to those that have occurred in the past. Bioterrorism attacks may produce outbreaks unlike any we have seen in the past.        
Also, the statistical methods proposed for outbreak detection are designed to detect only the beginning of a signal, and provide no subsequent information about the amplitude, shape or duration of the signal. Because these systems, including the FIR CUSUM, were developed for manufacturing and quality control settings, they assume that there is an easy way to confirm or identify the signal once it is detected. In disease surveillance, this assumption does not hold. There is rarely any easy way to confirm or identify the cause of any particular cluster, and the epidemiologist needs ongoing daily information about the amplitude, shape and duration of the outbreak to mount a proper investigation.
Thus, there is a need in the art for an apparatus and method for rapidly analyzing health related data that does not require a long “training period”, that signals the outbreak of an illness quickly and that provides the user detailed information about the amplitude, shape and duration of the outbreak. It therefore is an object of the invention to provide an apparatus and method for providing rapid syndrome analysis that is easy to use and interpret and that is flexible and scalable.