With the proliferation of imaging technology in consumer applications (e.g., digital cameras and Internet-based support), it is becoming more common to store libraries of digitized pictures and other multimedia documents, such as video files. There are a number of known approaches to identifying or organizing multimedia documents. One approach is to merely organize the documents in a chronological order based upon the times at which the documents were acquired. For example, digitized pictures may be stored in an archive that is presented to a viewer of the archive in a chronological order from the earliest acquired digital photo to the latest acquired digital photo. Another approach is to form separately labeled folders into which the multimedia documents may be stored. Thus, a folder may be labeled “Vacation,” and digital photos acquired during a particular vacation trip may be stored within the folder.
In a more complex organizational approach, the contents of documents are analyzed using enabling technology, so that the documents may be categorized on the basis of contents. This approach can be useful for businesses that utilize a large volume of multimedia documents, such as an image archive of a newspaper. Content-analysis technology may be used to classify documents with identifiers that describe the image contents. Following the classification, the identifiers can be input as a query during a search operation.
A technique for distinguishing individual documents, such as digital images, is to annotate each document. An “annotation” is defined herein as a semantic label that is associated with a document by an entry by a human. That is, annotations are human generated. Typically, an annotation is descriptive of the content of the document. For example, a digital image may have the annotation “This image depicts a Hawaiian beach.”
Annotations provide one form of “metadata,” which is defined as information other than attribute information, that is attached to the document without being contents of the document. Metadata instances may be human-generated, but may also be automatically generated. Other forms of metadata include song lyrics attached to an audio file and ratings attached to a video file.
As distinguished from metadata, “attributes” are defined as information regarding features of the associated document. Attributes may be classified as being specific to (1) intrinsic non-content features, such as time stamps and image dimensions, (2) intrinsic content features, such as color histograms, illuminations and face detections, and (3) access and usage features, such as access patterns and usage characteristics for documents that are stored at a common site.
While the available approaches to organizing documents operate well for their intended purposes, there are concerns with each approach. For example, the content analysis for automatic classification requires a high level of sophistication for proper implementation. On the other hand, the human-generated annotations are less complex, but are laborious when used within a large archive of documents. The same is true for other forms of human-generated metadata attached to digital images and other non-textual documents. Optionally, only a limited number of documents may be annotated, with the contents of the remaining documents being inferred. As one example, the first image acquired during a vacation may be associated with an annotation, allowing a user to infer that images acquired in the same calendar week are also images of vacation activity or scenery. The inference is valid in such a situation, but less valid in others.
What is needed is a method and system for enabling automated organizational processing of documents without a high level of complexity.