With the proliferation of the internet and digital photography, a challenge in the field of computer vision and image processing has been scalability. Large-scale image media collections containing billions of image files are difficult to organize, navigate and retrieve accurately and efficiently. Recent multimedia analysis research has focused on information retrieval of digital content. Such indexing and retrieval can be enhanced and facilitated with tags or annotations. While manually supplied annotations are proliferating on websites such as YouTube and Flickr, the tremendous growth in individual and distributed media collections require automatic or semi-automatic tools for annotation.
Thus, a scalable approach to media categorization is highly desired. The abundance of human-annotated data from the various websites provides an unlimited source of sample data for constructing a scalable media categorization system. One approach to constructing a scalable classification system is to leverage commonly available large scale training data by extracting improved standard features and calculating nearest neighborhood (NN) based indices to generate low-level representations of the training data to, for example, automate the annotation of data files. The disadvantage of the nearest neighbor approach, however, is the computational complexity of standard implementations that result in inefficient processing due to the large collection of files. Thus, the trade-off for using a large library of freely available annotated files is the efficiency in processing such a collection.
Therefore, there is a need for a high performance scalable media classification scheme for annotating large-scale media files with significant computational savings and improved efficiency. There is also a need for adapting annotated collections, trained against a large collection of media files, to customize more limited training collections and specific annotation vocabularies.