Constructing regions with geographic datasets while optimizing an objective function and satisfying multiple constraints is an important task for many research problems such as climate zoning, eco-region analysis, map generalization, census reengineering, public health mapping, and political districting.
Given a set of spatial objects, each of which having one or more attribute values, a regionalization method attempts to find an optimal grouping of the objects into a number of regions (which must be spatially contiguous) and meanwhile optimizes an objective function (e.g., a measure of multivariate similarities within regions). This is a combinatorial problem—it is not practical to enumerate all possible groupings to find the global optimal. Therefore, a regionalization method usually adopts heuristic-based approaches to reach a near-optimal solution.
A number of conventional regionalization methods exist. General-purpose clustering methods do not consider spatial contiguity and thus data items in a cluster are not necessarily contiguous in the geographic space. Existing regionalization methods that are based on the clustering concept often take three different strategies: (1) general-purpose clustering followed by spatial processing; (2) general-purpose clustering with a spatially weighted dissimilarity measure; and (3) enforcing contiguity constraints during the clustering process.
The first group of methods utilizes a general-purpose clustering method to derive clusters based on multivariate similarity and then divide or merge the clusters to form contiguous regions. The drawback of these types of methods is that the number and quality of regions is very difficult to control.
The second type of methods incorporates spatial distance explicitly in the similarity measure for a general clustering method (e.g., K-Means) and thus data items in the same cluster tend to be spatially close to each other. However, the spatial contiguity of a cluster is not guaranteed. Moreover, the incorporation of spatial distance in the similarity measure reduces the importance of multivariate information and may also not be able to find clusters of arbitrary shapes.
The third approach, represented by Regionalization with Dynamically Constrained Agglomerative Clustering and Partitioning (REDCAP), explicitly incorporates spatial contiguity constraints (rather than spatial similarities) in a hierarchical clustering process. Particularly, the REDCAP approach can optimize an objective function during the construction and partitioning of a cluster hierarchy to obtain a given number of regions. REDCAP is a family of six regionalization methods, which respectively extend the single-linkage (SLK), average-linkage (ALK), and complete-linkage (CLK) hierarchical clustering methods to enforce spatial contiguity constraints during the clustering process. These six methods are similar in that they all iteratively merge clusters (which must be spatial neighbors) into a hierarchy and then partition the hierarchy to obtain regions. They differ in their definitions of “similarity” between two clusters.
Although REDCAP methods are better than other methods and can produce reasonably good regions, there is much room for improvement in terms of optimizing the objective function.
Graph-partitioning methods may also be used to partition the data into a number of parts while optimizing an objective function, e.g., minimizing the total weights of edges to be cut. However, most graph partitioning methods cannot consider spatial contiguity constraint, except graph-based image segmentation methods. Even image segmentation methods focus on detecting objects in images and are not able to optimize an objective function based on within region homogeneity.
As such, a need exists for a contiguity constrained hierarchical clustering and optimization method that can partition a set of spatial objects into a hierarchy of contiguous regions while optimizing an objective function.