Search and classification algorithms are routinely used in many areas of computer science to identify media objects such as audio data, video data, image data, and geographic data objects based on features that describe attributes or elements of the media objects. Search algorithms seek to selectively identify one or more media objects that best correspond to or match a given set of features. For example, based on an input set of features, one or more instances of the same media object may be identified (e.g. multiple occurrences of a same image or video). In some applications such as image identification, the selected features are invariant of aspects of the media object that may be subject to changes, for example, resolution in image data or noise in audio data. Classification algorithms generate “classifiers” or statistical models which define a set of features which are associated with a class or category of media objects. The classifiers are then used to determine the likelihood that a media object belongs to a class or category of media objects based on its features.
One approach to searching and classifying these media objects is to partition the data elements of a media object into a plurality of sub-portions. A set of features is then generated for each sub-portion of the media object. The sub-portions can be spatial, temporal or both. For example, in a pixel-based image object, such as a 640×480 image, the data elements of image file are pixels, which may be partitioned into a plurality of sub-portions, such 32×32 pixel windows; from each sub portion, the features of interest are generated. In a video file, the sub-portions may be both spatial (e.g., 32×32 pixels) and temporal (e.g., 1 frame or some number of milliseconds) in extent. In audio objects, the sub-portions may be temporal, comprising a selected number of frames, samples, or windows of fixed duration (e.g., 50 milliseconds). The ability to partition the media file into sub-portions is based on the fact that the data represent real world entities: images/pictures, videos, audio recordings, which inherently have spatial and/or temporal characteristics.
This approach typically produces a very large number of features for the media objects to be classified, especially when the set of such objects is very large, such as millions of images. This is especially so when a separate feature is generated for each data element in each sub-portion of the media object (e.g. a feature is generated to represent every pixel in patch of an image) and/or there are many overlapping sub-portions generated for the media object (e.g., overlapping patches in an image, overlapping time windows in a video or audio file). Matrix decomposition techniques such as Principal Component Analysis are commonly used in search and classification algorithms to reduce the number of features in a set of features based on variances and correlations within the feature data.
In order to perform matrix decomposition techniques, a covariance matrix is first generated. The covariance matrix describes the covariance between the different features associated with the sub-portions of the media object. In instances where the features are based upon data elements in the media object, each entry in the covariance matrix is generated by taking the outer product between a pair of data elements X and Y, i.e. the sum of the products of the values associated with the data elements over all of the sub-portions of the media object. Therefore, if the number or “dimensionality” of the features/data elements is d, generating a covariance matrix has an order of operation of O(nd2), where n is proportional to the size of the media object. As those of skill in the art recognize, the order O of an operation represents the relative consumption of computing resources, such as memory and processor time, necessary to complete the operation. In most circumstances, the computing time required for performing an algorithm of order O(nd2) precludes the efficient generation of the covariance matrix. Therefore, the use of matrix decomposition techniques which can be used to reduce the dimensionality of the feature set is also precluded.