1. Technical Field
The present invention relates generally to a method for detecting statistically significant dimensional relationships between physical events and, more particularly, to a method and apparatus for measuring a degree of association between spatially referenced physical events in order to group these physical events when appropriate.
2. Discussion
It is difficult to overstate the importance of accurately measuring localized occurrences of a physical event such as, for example, disease outbreaks or unsafe pollution concentrations. Incorrectly identifying the existence of such an event can lead to unnecessarily alarming individuals in the xe2x80x9caffectedxe2x80x9d area as well as to causing the expenditure of resources, monetary and otherwise, better allocated elsewhere. Potentially even more devastating is the failure to recognize the existence of a localized event as early as possible. Left unchecked, highly contagious diseases can spread to cause an epidemic while environmental conditions such as increased pollution levels can irrevocably damage fragile natural balances.
Physical events are often spatially or temporally related with localized occurrences referred to as clusters. In these instances, the ability to accurately determine the existence of localized occurrences of physical events depends in part upon the specificity of the spatial or temporal property of each event. Despite the need to accurately measure statistically significant clustering in a variety of contexts, currently available modeling techniques do not accurately reflect the location of each event and therefore too often lead to incorrect inferences. This problem is particularly troublesome in spatially referenced physical events having uncertain spatial locations.
Sources of location uncertainty arise in a variety of contexts. For example, uncertainty can arise in an epidemiologic context due to the anonymity commonly maintained during the reporting of health events, the uncertainty of exposure locations given the mobility of human activity, and the transient nature of many environmentally transmitted disease causing agents. Uncertainty is amplified by the recording of event locations based upon zip code zones, census tracts, or grid nodes. Location uncertainty is also prevalent in the analysis of other spatially referenced physical events such as in the environmental and physical sciences (e.g. biology, geology, and hydrology).
Randomization testing of recorded events is commonly used to infer whether a spatial pattern exists within the sample of spatially referenced physical events. In these tests, the statistical significance of the spatial pattern is generally evaluated through the use of actual or estimated sample locations. When the actual locations of the samples are uncertain, a model is used to approximate the locations of the samples. The most frequently used method for approximating the location of a sample is the centroid model which assigns the area centroid location to all cases or samples occurring within an area.
A particular disadvantage of using randomization tests based upon centroid approximations is that the approach does not consider the spatial distribution of the at-risk population. As a result, approximations based upon the centroid of an area rather than the distribution of the at-risk population create an unnecessarily and inaccurately small universe of possible sample locations. For example, in epidemiological analyses, the universe of sample locations for randomization is more properly related to the geographic distribution of the human population in general and, more particularly, to the distribution of individuals at risk for a particular disease.
Additionally, randomization tests are problematic for spatial data because currently used techniques assume that the sampling space consists of the locations at which the observations were made. That is, they erroneously assume that the universe of possible locations consist entirely and solely of the sample locations. However, in most situations, other locations in the study area could have been sampled. As a result, the sampling space for the spatial randomization test is incorrectly specified and the distributions generated during the test pertain only to the sample locations rather than the at-risk population within the study area. This incorrect approximation leads to detection errors and the potentially dire consequences associated therewith, whether the locations of the physical events are certain or uncertain.
Accordingly, it is an object of the present invention to provide a method for accurately determining the degree of association between physical events.
A further object of the present invention is to provide a method for accurately determining the degree of association between physical events having uncertain locations.
Another object of the present invention is to determine the degree of association between a plurality of spatially referenced physical events based upon an analysis of reference and restricted distributions of a test statistic.
Still another object of the present invention is to determine the relative degree of illness for a given area based upon a comparison of the degree of association between the physical events to a threshold value.
A further object of the present invention is to determine the degree of association between a plurality of spatially referenced physical events through the use of a location model that reflects the spatial distribution of an at-risk population.
The present invention provides a method for measuring a degree of association between, and for selectively creating a cluster of, n plurality of spatially referenced physical events of a predetermined physical characteristic. The method includes the steps of assembling n plurality of physical events, assembling a universe of possible sample locations, determining a reference distribution, determining a restricted distribution, and determining the degree of association between the n plurality of physical events. Specifically, the physical events each have an indicia of location and a physical characteristic above a threshold. The step of determining a reference distribution is conducted by calculating a test statistic for each of nxe2x80x2 plurality of random allocations of the n plurality of physical events over the selected n plurality of sample locations. Further, the step of determining a restricted distribution includes calculating the test statistic for each of nxe2x80x3 plurality of restricted random allocations of the n plurality of physical events over the n plurality of sample locations.