1. Field of the Invention
Aspects of this invention relate generally to data quantization, and, more particularly to the quantization of data sets originating from collected data.
2. Description of Related Art
It is often desirable to reduce the total number of characteristics of data within a data set originating from collected data, such as the number of colors within a digital color image. Regarding digital color images specifically, raster images are typically composed of a plurality of individual pixels, each pixel having a particular color and location associated with it. The color of the pixel can be expressed in terms of the intensities of three color variables in a color model. Representative color model systems include RGB (red, green, and blue), CMY (cyan, magenta, and yellow), YIQ (xe2x80x9cYxe2x80x9d representing luminance and xe2x80x9cIxe2x80x9d and xe2x80x9cQxe2x80x9d representing chromaticity), HSV (hue, saturation, and value), also known as HSB (hue, saturation, and brightness), and HLS (hue, lightness, and saturation).
Using the RGB color model as an example, by defining colors in terms of their red, green, and blue components, all of the colors in the spectrum can be represented as points in a three-dimensional color cube, each axis representing one of the primary colors. The intensity of each color component is normalized to a value between zero and one, zero indicating the complete absence of that component and one indicating full saturation. In a 24-bit true color image, the intensity for each RGB component is stored as an eight bit value, which provides 256 different intensity levels for each primary color, for a total of 224 (over 16 million) colors. Each of these unique colors may be plotted as discrete locations on the three-dimensional color cube.
The total number of colors in an image, or data characteristics in a data set in general, may be reduced using a process known as color quantization, which enables the color information from a true color image to be conveyed with fewer bits than that used for the original image. In a typical quantization, all of the colors in a true color image are modified, or mapped, to a data structure such as a color palette, or color look up table (CLUT), wherein each of the 224 unique colors are mapped to one of the colors in the CLUT. In one form, the CLUT includes 256 unique colors. Using such a color palette, the CLUT""s 256 colors are indexed using integer numbers ranging from zero to 255. All of the colors actually present within a true color image are mapped to one of the 256 colors in the CLUT, which allows the colors from the true color image to be stored using only eight bits per pixel. While some color resolution may be lost during the quantization process, careful selection of the colors to be represented in the palette can minimize the impact on final image quality. For example, an analysis of the colors present in an image can be used to create an adaptive palette, which is a palette for which the selected colors to be used for quantization of an image are tuned to that particular image.
One method for selecting the colors in the CLUT is known as the popularity algorithm. This method determines which colors appear most often in the image, and these colors are chosen as the entries for the CLUT. After the CLUT entries are chosen, each of the unique colors in the original image is mapped to one of the colors selected as a CLUT color entry. Ideally, each unique color should be mapped to the CLUT entry nearest to that unique color on the three-dimensional color cube, which would provide the closest approximation of that unique color. However, computing the absolute Cartesian distances between each unique color as mapped on the color cube and each of the CLUT entries for each individual pixel requires significant computational processing and large amounts of memory. There has not heretofore been proposed an efficient method for mapping pixels to the nearest CLUT entry.
To determine which CLUT entry to associate with each pixel, a method known as the median-cut algorithm has been used. Using this method, the three-dimensional color cube is divided such that each CLUT color entry represents an equal number of pixels in the image. This is accomplished by creating a histogram of color values for each axis, and dividing the cube at the centers of this histogram using a plane orthogonal to that axis such that equal numbers of pixels remain on either side of the plane. This process is repeated for each axis until the color cube is divided into enough volumes to fill the CLUT. A CLUT entry is then assigned to each volume by computing the average of all the pixel values in that volume. Then, when quantizing the image, each pixel is mapped to the CLUT entry for the volume in which that pixel is located, thus approximating the closest CLUT entry for each pixel.
This approximation significantly reduces the processing that would be needed if absolute distances were determined for each pixel. However, because of the use of dividing planes orthogonal to the axes, the volumes associated with each CLUT entry are shaped in the form of a parallelepiped. Thus, pixels located in the outer corners of the volumes may in fact be closer to a CLUT entry corresponding to an adjacent volume, resulting in a less accurate color quantization.
Another limitation of this approach is that the dividing planes are only formed orthogonal to one of the three axes. In an actual image, the concentration of pixel locations do not necessarily align perfectly with these axes. It has been proposed to analyze the densities of pixel locations in the color cube, and to rotate the axes so that the sides of the volumes better align with the layout of the pixels. Although this provides some improvement, it does not overcome the fundamental problem associated with dividing the color cube into volumes having sides that are all aligned in the same directions.
Thus, a disadvantage of the existing algorithms for data quantization is that they fail to correctly demarcate boundaries between areas in the data space that are mapped to data structure entriesxe2x80x94in the case of color image processing, for example, they ineffectively map the original image pixels to the selected CLUT entries. It is far too processor-intensive to determine the absolute distances between each discrete location and the CLUT entries on the color cube in order to identify the closest CLUT entry. On the other hand, using only the rough geometric approximation of the median-cut algorithm requires less computation, but may result in inferior image quality.
Accordingly, there is a need for improved data quantization methods and apparatuses for data sets originating from collected data, such as color images and other data sets, which minimize the computational loads while accurately preserving the data quality in the modified data space.
According to one aspect of the present invention, the foregoing needs are addressed by a computer-readable medium encoded with a computer program which, when loaded into a processor, is operative to perform a method for quantizing a data set having a plurality of dimensions defined by perpendicular axes. The data set includes a plurality of data points, and each data point has a data characteristic. The method includes the steps of selecting a predetermined number of data classes based on a distribution of the data characteristics of the plurality of data points within the data set, the predetermined number of data classes less than the number of data points; forming a data structure based on the predetermined number of data classes; and resolving each of the plurality of data points into one of the predetermined number of data classes using a method, which includes locating a plurality of region centers within the data set, each region center associated with one of the predetermined number of data classes; representing formation of a plurality of regions within the data set by iteratively expanding a predetermined geometric representation from each region center radially outward, each iteration of expansion of the predetermined geometric representation occurring by an integer unit of measure, the iterative expansion causing adjacent regions to form region boundaries, the region boundaries permitted to be non-parallel to the perpendicular axes; and after each iteration of expansion, assigning a value to each of the unassigned data points within each region, the assigned value associated with the predetermined data class of a particular region center, the particular region center being the region center associated with the first region to capture the data point during the iterations of expansion. The resolved data points are associated with the data structure, and, using the associated resolved data points, a modified representation of the data set is generated.
According to another aspect of the present invention, a method for receiving and quantizing a data set originating from collected data is provided. The data set has a plurality of dimensions defined by perpendicular axes, and includes a plurality of data points. Each data point has a data characteristic. The method includes the steps of: receiving the data set; selecting a predetermined number of data classes based on a distribution of the data characteristics of the plurality of data points within the data set, the predetermined number of data classes less than the number of data points; forming a data structure based on the predetermined number of data classes; and resolving each of the plurality of data points into one of the predetermined number of data classes using a method, which includes the steps of locating a plurality of region centers within the data set, each region center associated with one of the predetermined number of data classes; representing formation of a plurality of regions within the data set by iteratively expanding a predetermined geometric representation from each region center radially outward, each iteration of expansion of the predetermined geometric representation occurring by an integer unit of measure associated with a data point, the iterative expansion causing adjacent regions to intersect and form region boundaries, the region boundaries permitted to be non-parallel to the perpendicular axes; and after each iteration of expansion, assigning a value to each of the unassigned data points within each region, the assigned value associated with the predetermined data class of a particular region center, the particular region center being the region center associated with the first region to capture the data point during the iterations of expansion. The resolved data points are associated with the data structure, and, using the associated resolved data points, a modified representation of the collected data is generated.
In a further embodiment, the collected data may be an image captured by an image-collecting device, a seismic measurement of a geographic area, a measurement of an architectural structure, or a measurement of a manufactured device.
In accordance with a further aspect of the present invention, an apparatus for quantizing a data set having a plurality of dimensions defined by perpendicular axes, and including a plurality of data points, each data point having a data characteristic, is provided. The apparatus includes a computer-readable storage medium, and a processor responsive to the computer-readable storage medium and to a computer program. When loaded into the processor, the computer program is operative to perform a method including the steps of selecting a predetermined number of data classes based on a distribution of the data characteristics of the plurality of data points within the data set, the predetermined number of data classes less than the number of data points; forming a data structure based on the predetermined number of data classes; and resolving each of the plurality of data points into one of the predetermined number of data classes using a method including the steps of: locating a plurality of region centers within the data set, each region center associated with one of the predetermined number of data classes; representing formation of a plurality of regions within the data set by iteratively expanding a predetermined geometric representation from each region center radially outward, each iteration of expansion of the predetermined geometric representation occurring by an integer unit of measure associated with a data point, the iterative expansion causing adjacent regions to intersect and form region boundaries, the region boundaries permitted to be non-parallel to the perpendicular axes; and after each iteration of expansion, assigning a value to each of the unassigned data points within each region, the assigned value associated with the predetermined data class of a particular region center, the particular region center being the region center associated with the first region to capture the data point during the iterations of expansion. The resolved data points are associated with the data structure, and, using the associated resolved data points, a modified representation of the data set is generated.