1. Field of the Invention
The present invention is directed generally to digital data processing, and more particularly to analysis of high-dimensionality data.
2. Description of the Related Art
Machine-based classification and identification of images based on digital image data is a complex problem. Compared to other types of digitized data, image data for any but the simplest images tends to be high in bandwidth and complexity. Techniques to identify features in images, such as the objects or surfaces represented in an image, often rely on algorithms that attempt to identify points in an image that bear some relationship to one another and to segregate these identified points from other sets of points. Such algorithms may be referred to as clustering algorithms.
In order to robustly represent the rich variety of features that may be present in image data, representations of image data may be highly dimensional. For example, image data may reflect numerous image properties beyond merely geometric location within a Cartesian coordinate space. Depending on the approach used, individual image data points may have dozens or hundreds of dimensions.
To be effective in the context of imaging, clustering algorithms thus need to readily extend to highly dimensional data. However, many existing clustering techniques are not robust, in that they are sensitive to noise in the image data and/or small variations in the parameters that govern the algorithm's performance. Moreover, complex images may include vast numbers of data points of high dimensionality, and many existing clustering techniques fail to scale well as the data set size increases.