1. Field of the Invention
The invention relates to the scaling of the indexing data of a multimedia document.
The field of the invention is that of the storage of and/or searching for digital multimedia documents in a database.
Prior to being stored in a database, a multimedia document is indexed. These indexing data pertaining to details such as the author, type, date of creation, a summary, descriptions, etc. are associated with the document. They are used especially to sort out the document by author, descriptor, etc. and/or classify the documents and/or of course search for a document, generally with a view to accessing the found document.
The volume of documents to be stored and therefore the associated indexing data is increasing every year. According to certain sources, it is thought that the storage requirements of certain companies are more than doubled every year.
2. Description of the Prior Art
Encoding methods have been developed, designed to reduce the volume of data to be stored.
Certain encoding methods relate to documents: they reduce the volume of the document itself while providing for its total or almost total restitution by a decoding method corresponding to the encoding method used.
Other methods relate to the scaling of the indexing data of a multimedia document. This scaling is aimed of course at reducing the volume of these indexing data but it is not indispensable for it to restitute these indexing data. It should be possible, with the scaled data, for example to sort out multimedia documents and/or to classify and/or search for a multimedia document by making comparisons between these indexing data and the indexing data of the documents of the database.
The reduction of the volume of the indexing data obtained by scaling at a given time may prove to be subsequently insufficient, especially when the volume of initial data increases and/or it is sought to obtain a higher scale ratio.
It is then possible to rescale the initial data. This is a painstaking and even impossible operation when the data are inaccessible. It is also possible to redo a scaling operation on these already scaled data and another scaling operation on the new (as yet non-scaled) data. These data, thus scaled according to different scaling operations, generally result in a disparity between the scale ratios and between the scaled data. The sorting or searching operations performed on these indexing data thus scaled cannot be performed on all the scaled data but on each category of scaled data.
It is an aim of the invention to avoid these disparities.
An object of the invention is a method for the scaling of the indexing data of a multimedia document such that the scale ratio of the scaled data may be subsequently increased to make these data more compact, with a guarantee that the data thus obtained will be equivalent to the data obtained by applying a higher scale ratio to the initial data at the very outset.
When, for example, the initial data are initially scaled in groups of size Nxe2x80x2 and the scaled data are scaled once again in groups of size Nxe2x80x3, the resulting data are equivalent to the data obtained by the scaling of groups of initial data of size Nxe2x80x2Nxe2x80x3.
More generally, if the initial data are initially scaled in groups sized nxe2x80x2J, with j=1 to J to obtain scaled data dxe2x80x2J and if these scaled data are rescaled in groups Dxe2x80x2k sized Jk with k=1 to K, the resulting scaled data dxe2x80x3K are the same as they would be if the scaling were to be done directly on groups of initial data sized nxe2x80x3k, nxe2x80x3k being the sum of the nxe2x80x2j values of the group Dxe2x80x2k.
An object of the invention is a method for the scaling of indexing data D=(dn, n=1 to N) of a multimedia document wherein mainly the method comprises the following steps which consist:
a) at the time t, in grouping the data D in distinct and consecutive groups Dj respectively sized nxe2x80x2j, j varying from 1 to J and respectively scaling each group Dj to a value dxe2x80x2j according to at least one determined scaling method C, and in storing the data Dxe2x80x2=(dxe2x80x2j, j=1 to J) thus obtained.
b) subsequently, at the time txe2x80x2 greater than t, when the number of data resulting from the previous scaling operation is too great, in grouping the data Dxe2x80x2 in distinct and consecutive groups Dxe2x80x2k respectively sized Jk, k varying from 1 to K and scaling each group Dxe2x80x2k by a value dxe2x80x3k according to a rescaling method Cxe2x80x2 compatible with the scaling method C in such a way that the data dxe2x80x3k are equivalent to those obtained by applying the scaling method C directly to distinct and consecutive groups of data of D sized nxe2x80x3k, nxe2x80x3k being the sum of the nxe2x80x2j values of the group Dxe2x80x2k and in storing the scaled data Dxe2x80x3=(dxe2x80x3k, k=1 to K).
According to one characteristic of the invention, the step b) is reproduced using, for Dxe2x80x2, the data resulting from the last rescaling operation.
According to another characteristic of the invention, with the data dxe2x80x2j, there are associated the sizes nxe2x80x2J and/or with the data dxe2x80x3K, there are associated the sizes nxe2x80x3k.
According to another additional characteristic:
each datum dn is weighted by a weight wn,
each datum dxe2x80x2j is weighted by a weight wxe2x80x2j, each of these weights being equal to the sum of the weights of the corresponding data of the groups Dj,
each datum dxe2x80x3k is weighted by a weight wxe2x80x3k, each of these weights being equal to the sum of the weights of the corresponding data of the groups Dxe2x80x2K,
and the weight of each datum is associated with said datum.
Prior to the storage of the scaled data, a header comprising at least one label specifying the scaling method may be associated with the scaled data. The header furthermore advantageously comprises the number of data before and/or after the encoding.
According to one embodiment of the invention, the determined scaling method C is the method C7 based on the histogram of the data groups Dj according to predefined categories and the rescaling method Cxe2x80x2 is the method C7xe2x80x2 based on the computation of the sum, term by term, of groups of histograms of Dxe2x80x2.
According to one characteristic of the invention, the data D are series of scalar values or vectors.
According to another embodiment of the invention, the determined scaling method C is the method C4 which consists of the random choice of a data from each group of data Dj and the determined rescaling method Cxe2x80x2 is then the method C4xe2x80x2 which consists of the random choice of one datum among each group of data Dxe2x80x2k.
According to another characteristic of the invention, the data dxe2x80x3k are equal to those obtained by applying the scaling method C directly to distinct and consecutive groups of data D respectively sized nxe2x80x3k.
According to various embodiments of the invention:
the determined scaling method C is the method C3 based on the computation of the mean of the groups of data Dj and the rescaling method Cxe2x80x2 is the method C3xe2x80x2 based on the computation of the mean of each group of data Dxe2x80x2k, or
the determined scaling method C is the method C1 based on the computation of the minimum of each group of data Dj and the rescaling method Cxe2x80x2 is the method C1xe2x80x2 based on the computation of the minimum of each group of data Dxe2x80x2k, or
the determined scaling method C is the method C2 based on the computation of the maximum of each group of data Dj and the rescaling method Cxe2x80x2 is the method C2xe2x80x2 based on the computation of the maximum of each group of data Dxe2x80x2k, or
the determined scaling method C is the method C5 based on the choice of the first datum from each group of data DJ and the rescaling method Cxe2x80x2 is the method C5xe2x80x2 based on the choice of the first datum from each group of data Dxe2x80x2k, or
the determined scaling method C is the method C6 based on the computation of the last datum from each group of data Dj and the rescaling method Cxe2x80x2 is the method C6xe2x80x2 based on the choice of the last datum from each group of data Dxe2x80x2k, or
the determined scaling method C is the method C3 based on the computation of the mean of each group of data Dj, then the method C8 based on the computation of the variance of each group of data Dj and the rescaling method Cxe2x80x2 is the method C8xe2x80x2 based on the means of the groups of data Dxe2x80x2k resulting from scaling according to C8 and the variances of the groups of data Dxe2x80x2k resulting from scaling according to C3.
According to one characteristic of the invention, all sizes nxe2x80x2j and Jk are powers of two.
According to another embodiment of the invention, the determined scaling method C is the method C3 based on the computation of the mean of each group of data Dj and then the method C9 based on a decomposition of the variance of each group of data Dj into a series of coefficients each describing the variability at a particular scale, and the rescaling method Cxe2x80x2 is then the method C9xe2x80x2, based on the means of the groups of data Dxe2x80x2k resulting from scaling according to C9, and a decomposition of the variance of the groups of data Dxe2x80x2k resulting from scaling according to C3.
According to another embodiment, in the event that the data D consist of a series of vectors, the determined scaling method C is the method C3 based on the computation of the mean of each group of data Dj and then the method C10 based on the computation of the covariance of each group of data Dj and the rescaling method Cxe2x80x2 is the method C10xe2x80x2 based on the computation of the mean of groups of data Dxe2x80x2k resulting from scaling according to C10 and of the covariance of groups of data Dxe2x80x2k resulting from scaling according to C3.
In the event that the data D are vectors, the determined scaling method C may be the method C3 based on the computation of the mean of each group of data Dj, and then the method C11 based on the computation of the sum of the terms of the diagonal of the covariance matrix of each group of data Dj and the rescaling method Cxe2x80x2 may be the method C11xe2x80x2 based on the computation of the mean of each group of data Dxe2x80x2k resulting from scaling according to C11 and the sum of the terms of the diagonal of the covariance matrix each group of data Dxe2x80x2k resulting from scaling according to C3.