With the proliferation of imaging technology in consumer applications (e.g., digital cameras and Internet-based support), it is becoming more common to store digitized photo-albums and other multimedia contents, such as video and audio files, in personal computers (PCs). There are several popular approaches to categorizing multimedia contents. One approach is to organize the contents (e.g., images) in a chronological order from the earlier events to the most recent events. Another approach is to organize the contents by a topic of interest such as a vacation or a favorite pet. Assuming that the contents to be categorized are relatively few in number, utilizing either of the two approaches is practical, since the volume can easily be managed.
In a less conventional approach, categorization is performed using enabling technology which analyzes the content of the multimedia to be organized. This approach can be useful for businesses and corporations, where the volume of contents, including images to be categorized, can be tremendously large. A typical means for categorizing images utilizing content-analysis technology is to identify the data with classifiers (i.e., semantic descriptions) that describe the attributes of the image. A proper classification allows search software to effectively search for the image by matching a query with the identified classifiers. As an example, a classification for an image of a sunset along a sandy beach of Hawaii may include the classifiers sunset, beach and Hawaii. Following the classification, any one of these descriptions may be input as a query during a search operation.
A substantial amount of research effort has been expended in content-based processing to provide a more accurate automated categorization scheme for digital image, video and audio files. In content-based processing, an algorithm or a set of algorithms is implemented to analyze the content of the subject data for identifying classifier(s) that can be associated with the files. Content similarity, color variance comparison, and contrast analysis may be performed. For color variance analysis, a block-based color histogram correlation method may be performed between consecutive images to determine color similarity of images at the event boundaries. Other types of content-based processing allow a determination of an indoor/outdoor classification, city/landscape classification, sunset/mid-day classification, face detection classification, and the like.
While substantial effort has been focused on content-based processing, alternative or additional approaches have been developed to provide better organization of files of non-textual subject data. A publication entitled “Augmented Album: Situation-dependent System for a Personal Digital Video/image Collection” by Hewagamage and Hirakawa, IEEE, April, 2000, describes a system utilizing non-content-based data (i.e., geographical location, time and corresponding events) to improve the accuracy for categorizing files. Another system incorporates probability principles to existing content-based classification schemes to improve system performance.
Even allowing for the development of enabling technology, the ability to properly categorize files and adequately retrieve the desired files remains questionable. An improper categorization would render the categorization ineffective. This is a concern since files that appear similar, yet distinct, can easily be mis-categorized. As an example, an image having a sunset may mistakenly be categorized as a sunrise. Consequently, the probability of the user being able to retrieve the image in a search is reduced.
What is needed is a file-categorization method and system which provide a high level of reliability with regard to assignments of file classifiers.