The present invention relates to vector quantisation data compression and in particular to a method of constructing a codebook for use in vector quantisation data compression and to codebooks so constructed.
When storing or transmitting large amounts of digital data it is often desirable to be able to compress the data to a significant extent to reduce storage space or transmission time. This need for compression is particularly acute when dealing with digital video which tends to involve extremely large amounts of data which must be handled at high speed (particularly for real-time applications such as video phone communication).
One technique used for compressing digital video is known as xe2x80x98vector quantisationxe2x80x99, where a codebook of reference patches (e.g. relatively small portions taken from one or more xe2x80x98libraryxe2x80x99 or xe2x80x98archetypalxe2x80x99 images) is utilised. In the simplest form of this technique, each image to be compressed is partitioned into a number of image patches and a matching (i.e. similar) reference patch selected for each image patch from the codebook by comparing patches on a pixel by pixel basis. The codebook index for each chosen reference patch is stored, together with the corresponding position vectors of the image patches (i.e. their position in the original image), to provide a compressed representation of the image. Provided that a copy of the codebook is available, an approximation to the original image can be constructed by using the stored codebook indices to recover the required set of reference patches and inserting these into an image frame using the respective stored image patch position vectors. The achievable degree of compression is a function of the size of the image patches into which the image is partitioned, larger patches allowing higher compression.
Simple vector quantisation techniques require an exhaustive search of the codebook for each image patch, a process which is extremely computationally expensive where large codebooks are used to maximise the quality of the compressed images and where large patches, necessitating a large number of pixel comparisons, are generated to maximise the compression ratio. One technique used to reduce this search overhead is known as xe2x80x98hierarchical vector quantisationxe2x80x99 (HVQ). This technique is described in xe2x80x98Hierarchical Vector Quantisation of Perceptually Weighted Block Transformsxe2x80x99; N Chadda, M Vishwanath, and P A Chou; Proceedings of the Data Compression Conference; IEEE, 1995, pp3-12.
To illustrate HVQ, consider that it is desired to compress an individual digitised frame of a black and white video sequence where the frame is composed of an array of pixels each having an intensity value in the form of a byte-integer. suppose that the frame is divided into patches formed by pairs of horizontally adjacent pixels (so that each pair is represented by a pair of intensity values or a 2-D vector [i,j]) and a codebook B0 is provided which contains a number of reference patches, each reference patch consisting of a pair of horizontally adjacent pixels and the number of reference patches in the codebook (e.g. 256) being considerably less than the total number of possible pixel pair patches. An index table T0 is provided which maps every possible value of the 2-D vectors to associated codebook addresses, such that T0[i,j] addresses the closest entry in B0 (i.e. B0[T0[i,j]]) to the vector [i,j]. Finding the closest entry in the codebook to a given input vector can then be determined simply by looking up the table T0.
Consider now that the frame to be compressed is sub-divided into patches of 2xc3x972 pixels so that each patch is represented by a four dimensional vector of intensity values [i,j,k,l]. Each of these 4-D vectors is split into two 2-D -vectors and, using tables T0, can be mapped to a further 2-D vector [T0[i,j], T0[k,l]]. Suppose that a second codebook B1 contains a number of 4-D vectors [p,q,r,s] which correspond to a set of four pixel patches (where again the number of patches is considerably less than the total number of possible four bit patches). A second level index table T1 -is constructed such that B1 [T1 [T0[i,j], T0[k,l]]] is the closest entry in B1 to [B0[T0[i,j]],B0[T0[k,l]]]. Finding the closest entry in the codebook B1 to a given input vector can then be determined by looking up the table T0 for pixel pairs [i, j] and [k, l], and then applying the resultant 2-D vectors To[i, j], To[k, l] to the table T1. This process is illustrated in FIG. 1.
In a non-HVQ process, it is necessary to provide a codebook containing 4-D vectors and, for each 4-D vector (or patch) in an image to be compressed, to compare each entry in that vector against the corresponding entry in each vector in the codebook. Such an exhaustive search process requires nxc3x97m comparison operations to find a matching vector, when n is the number of elements in the vector and m is the number of entries in the codebook. The HVQ process in contrast requires nxe2x88x921 look-up steps to find a matching vector. Given that comparisons and look-ups are of approximately the same computational cost, it will be appreciated that nxc3x97m is much greater than nxe2x88x921 for reasonable values of m and that the HVQ look-up process will be approximately m times faster than the exhaustive search.
For each further level of the index table (T2,T3 etc) added in the HVQ process, the compression ratio is increased further. It is noted that in practice, only the final level codebook B need be retained as the intermediate codebooks are not essential to carrying out the HVQ process.
Hierarchical vector quantisation may be used to compress colour images by separating out the three components of colour images (e.g. red, blue, green) and compressing them separately, and recombining the three sets of codebook reference patches to produce a decompressed image.
It will be clear that in order to use HVQ effectively, it is necessary to construct a series of tables T0,1 . . . m and codebooks B0,1 . . . m where m is log2 of the number of pixels in the largest patch used for compression. The conventional HVQ approach is to develop codebooks first, e.g. by extracting reference patches from a number of archetypal image frames, and then to derive the tables from the codebooks.
Since the patches at the final level m are twice the size of the patches at level mxe2x88x921, the index table Tm is constructed by taking all possible pairs of patches at level mxe2x88x921 and by conducting an exhaustive search of patches in the codebook at level m to identify the patch at level m which is most similar. Thus, if the level mxe2x88x921 patches 7 and 13 when placed together most closely resemble level m patch 100, Tm[7,13] would be set to 100. This process is propagated back thorough the levels to create a codebook and an index table at each level.
In this approach, the selection of level m codebook patches taken from the archetypal image frames is essentially arbitrary. The result is that certain ones of these patches may never or only infrequently be selected as a match for a patch from an image to be compressed, resulting in an inefficient codebook.
It is an object of the present invention to overcome or at least mitigate disadvantages of conventional vector quantisation codebooks.
It is a second object of the present invention to provide a method of generating a vector quantisation codebook, wherein the entries in the codebook are used with substantially equal frequency when compressing data whose statistical properties mirror those of the training data used to construct the codebook.
According to a first aspect of the present invention there is provided a method of constructing a vector quantisation codebook from at least one archetypal data array composed of a set of data values, the method comprising:
1) selecting from the data array(s) a first multiplicity of n-dimension sample vectors, each sample vector consisting of a set of data values which are contiguous in the array(s) and each sample vector defining a point in a finite n-dimensional space;
2) partitioning said space into a predetermined number of regions, each region containing substantially the same number of sample vectors;
3) assigning to each said region a unique index, where the indices are selected to codify the partitioning process carried out in step 2);
4) determining for substantially all possible points within said space, the regions in which these points are located, and constructing a look-up index table mapping substantially all possible points to the respective region indices;
5) selecting from the data array(s) a second multiplicity of n-dimension sample vectors, each sample vector consisting of a set of contiguous data values, and replacing each of these sample vectors with the associated region index obtained by looking up the index table generated in step 4) to create a further data array or arrays;
6) iteratively repeating steps 1) to 5) for the further data array(s) and any subsequently generated further data array(s), wherein each index generated in the final iteration is derived from a set of nm dimension sample vectors in the archetypal data array(s), where m is the number of iterations carried out;
7) for each index generated in the final iteration, creating an nm dimension reference vector which is representative of the associated set of nm dimension sample vectors in the archetypal data array; and
8) constructing a codebook containing the reference vectors, where each index generated in the final iteration points to the location in the codebook of the corresponding reference vector.
The approach adopted is to find a small set of points in an n dimensional space which are representative of a much larger population. For an efficient coding all of these representative points should be equally frequently used. Assume that we have some information about the population to be represented, for instance an encoding of its probability density function over a regular grid in the n dimensional space. The task can then be split into two parts:
a) derive a partitioning of the n dimensional space into subspaces such that the integrals of the probability density functions over all of the subspaces are approximately equal; and
b) find the xe2x80x98centre of gravityxe2x80x99 of each subspace and use this to represent the population within the subspace.
The problem is computationally difficult when one is working with high dimensional spaces of the type used for image compressionxe2x80x9416 dimensions and up. The theory behind the present invention is that step a) can be made easier if it is performed on spaces of low dimensions (typically 2), which implies that the whole task would be easy if there were some way of splitting the partitioning problem in n dimensions down into a series of problems in lower dimensions. This approach of splitting things up becomes possible if as a result of performing a partitioning on say a 2 dimensional space we are able to perform an approximately distance preserving projection from 2-dimensional sub-manifolds of the original n-dimensional space into a 1-dimensional code space which is itself a metric space. Thus by repeated pairwise application of the submanifold reduction, we can transform say a 16 dimensional metric space into an 8 dimensional metric code space. Repeated application of the submanifold reduction process can then allow us to map 16 dimensional spaces onto one dimensional code spaces.
Preferably, in step 1), the selected vectors are distributed substantially evenly about said data array(s). More preferably, every data value is contained within at least one vector.
Preferably, step 5) comprises sub-dividing the data array(s) into said second multiplicity of sample vectors, so that every entry is contained within at least one vector. Alternatively however, the selected vectors may encompass only a fraction of the data array, where that fraction coincides with those areas from which the first multiplicity of sample vectors are subsequently chosen.
In one embodiment of the present invention, step 2) comprises: (a) determining the mean and variance of the distribution of vectors for each dimension of said space; (b) identifying that dimension for which the vectors have the greatest variance; and (c) dividing the space into two regions, a first region containing those vectors which have a value for the selected dimension which exceeds the mean value and a second region containing those vectors which do not. For each region, steps (a) to (c) are recursively repeated until the required number of regions is obtained. This number is determined by the desired size of the look-up index tables.
In an alternative embodiment of the invention, step 2) comprises determining the Principal Component (or regression line) of the vector distribution and then determining the mean of the Principal Component. The space is then split into two by a line perpendicular to the Principal Component and passing through the mean thereof to create two regions or sub-spaces. Each of these regions is in turn split, with the process continuing recursively until a desired number of regions is achieved.
Whilst the two partitioning operations set out above involve repeated applications of the same partitioning step, this is not an essential feature of the invention. In an alternative embodiment, two different partitioning steps could be applied alternatively in each partitioning operation.
The number of regions created by the partitioning operation is typically less than 1000 to 4000, e.g. 128 or 256. For higher numbers of regions, the computational complexity may be unmanageable.
Preferably, for each repeat of steps 1) to 4), the sequence in which the regions are created is used to determine the assignment of indices. In particular (for binary indices), the first split determines the most significant bit of the index, the second split the next most significant bit etc. The appropriate bit is set to 1 for the region containing vectors exceeding the mean and to 0 for the region containing vectors below the mean. It will be appreciated that a similar result can be achieved by replacing 1""s with 0""s and 0""s with 1""s.
Preferably, in step 3), the result of assigning the indices is that the indices preserve the order of the regions having regard to the average value of data values making up the vectors of each region.
The present invention is particularly applicable to creating codebooks for use in compressing digital images which comprise an array of image pixels, each defined by an intensity value. The vectors at the first level represent patches of contiguous pixels in an archetypal image frame whilst the vectors at the higher levels correspond to patches in a shrunk image. At each level, the regions are assigned indices which preserve the subjective order of the regions. For example, regions containing relatively dark patches are assigned relatively low indices whilst regions containing relatively bright patches are assigned relatively high indices.
Preferably, in step 1) of the above method, n=2. More preferably, in the first iteration, every pair of horizontally adjacent data values in the archetypal data array(s) is selected to provide the vectors for use in step 2) so that substantially all data values appear in two data value pairs. In step 5), the or each data array is sub-divided into horizontally adjacent pairs of data values so that each pair abuts the neighbouring pair(s). In the second iteration, in step 1) every pair of vertically adjacent data values are extracted whilst in step 5) [if the process comprises three or more iterations] the or each further data array is sub-divided into vertically adjacent pairs of data values. For each subsequent iteration, horizontally and vertically adjacent pairs of data values are chosen alternately.
Whilst the sample vectors of the first and second multiplicities preferably all have the same orientation, this is not essential. For example, there may be a mixture of horizontal and vertical patches in either of the multiplicities.
The above method is capable of producing optimised codebooks for use in any vector quantisation process. However, it is particularly suited to producing codebooks for use in hierarchical vector quantisation (HVQ) processes given that index look-up tables for each level, and which are necessary for HVQ, are an inherent product of the codebook creation method.
In certain embodiments of the above method, it may be desired to additionally carry out steps 7) and 8) at the end of each iteration so as to create a codebook at each level. This may be necessary, for example, where an image is to be compressed using variable sized reference vectors.
According to a second aspect of the present invention there is provided a method of constructing a vector quantisation codebook from at least one archetypal data array composed of a set of data values, the method comprising:
1) selecting from the data array(s) a first multiplicity of n-dimension sample vectors, each sample vector consisting of a set of data values which are contiguous in the array(s) and each sample vector defining a point within a finite n-dimensional space;
2) partitioning said space containing said sample vectors into a predetermined number of regions containing substantially the same number of vectors;
3) assigning to each said region a unique index, where the indices are selected to codify the partitioning process carried out in step 2);
4) determining for substantially all possible points within said space the regions in which these points are located, and constructing a look-up index table mapping substantially all such possible points to the respective region indices;
5) for each region, determining a reference vector which is representative of the sample vectors in that group; and
6) constructing a codebook containing the reference is vectors, where each index points to the location in the codebook of the corresponding reference vector.
According to a third aspect of the present invention there is provided a method of constructing a set of index tables for use in a hierarchical vector quantisation process, the tables being constructed from at least one archetypal data array composed of a set of data values, the method comprising:
1) selecting from the data array(s) a first multiplicity of n-dimension sample vectors, each sample vector consisting of a set of data values which are contiguous in the array(s) and each sample vector defining a point in a finite n-dimensional space;
2) partitioning said space into a predetermined number of regions, each region containing substantially the same number of sample vectors;
3) assigning to each said region a unique index, where the indices are selected to codify the partitioning process carried out in step 2);
4) determining for substantially all possible points within said space, the regions in which these points are located, and constructing a look-up index table mapping substantially all possible points to the respective region indices;
5) selecting from the data array(s) a second multiplicity of n-dimension sample vectors, each vector consisting of a set of contiguous data values, and replacing each of these sample vectors with the associated region index obtained by looking up the index table generated in step 4) to create a further data array or arrays; and
6) iteratively repeating steps 1) to 5) for the further data array(s) and any subsequently generated further data array(s), wherein each index generated in the final iteration is derived from a set of nm dimension sample vectors in the archetypal data array(s), where m is the number of iterations carried out.
The method of the above third aspect of the present invention is able to generate a set of index tables for use in applications where it is desired to encode data but where it is not necessary to be able to reconstruct the original data from the encoded data, i.e. where a codebook is unnecessary.
The term xe2x80x98archetypalxe2x80x99 used above encompasses real images captured, for example, using a camera. However, the term also encompasses artificially generated images such as pre-existing codebooks